The nickname Big data is used for anything which have to do something with storage, but people still do not understand exactly what that term technically refers to. What it has to do with Analytics ? Making it clear is necessary to divide the Big Data Archives in two distinct types: Big Data for analytics for Big Unstructured Data. That makes Management of Unstructured Data easier. The large data stores to perform statistical analysis has become a popular paradigm in the last decade and also thanks to many research projects that have been able to innovate the Management of Unstructured Data. This new technology has allowed researchers of any field of study to capture data in high quantities and with an ease that was not possible before. Think of any scope in which there are installed sensors that constantly capture information about a task and sent to a platform for analyzing and questioning to discern information. Agriculture, administration, medical all disciplines have benefited greatly from these innovations. This is the importance of Management of Unstructured Data.
Management of Unstructured Data : Basics
From here, however the problem of evolving traditional systems of storing large amounts of data in database technologies different from the one relational arises, which became every day more obsolete. Systems like Hadoop were then created as an alternative to relational model and they made it possible to store this massive amount of data such as large file threads. The phenomenon of Big Data for semi-structured data took birth.
Today we are witnessing a similar trend for unstructured data. Studies show that the requirements of data storage will grow by 30 times over the next decade. About 80 percent of these data are large files: office documents, movies, music, pictures. In a similar way as the databases has evolved over the past decade, the file system must now evolve as they are no longer the best way to store this data. They are not able to scale sufficiently and are becoming obsolete.
More on Management of Unstructured Data
An illustrative example is given by what Picasa does for us: up to a little ‘time ago we stored the images at their best in organizing a file system (possibly with a good backup). A folder per year, a month, or for a holiday and one for each party. Today we just unload all the images in a folder and Picasa reorders based on the date, location, face recognition or other metadata. With a smart query, we can view the images we are looking for very quickly, much faster than file systems. And we no longer have to worry about having a backup of that data so that we may store copies in the cloud automatically.