What is Data Lake in Big Data?

Abhishek Ghosh

By Abhishek Ghosh March 30, 2017 8:54 am Updated on March 30, 2017

What is Data Lake in Big Data?

In our one previous guide, we have shown step by step tutorial on how to create Data Lake on server and talked basic matters around data lake. A data lake comprises of multiple repositories providing data to an organisation for analytical processing including analytics & reporting. In another guide, we have talked about medical prediction using the data lake. It is James Dixon who coined the term data lake for using useful data in “cleansed, packaged and structured for easy consumption” while the whole data lake is more like a tank of water. Data will flow from the streams to the lake. Users have access to the lake to do the work they want. In other words, like electricity is analogous to water, big data is analogous to water. Data Lake is source of data like water. The terminology and comparison went to and odd shape by some companies and Gartner has a good article :

http://www.gartner.com/newsroom/id/2809117

1	http://www.gartner.com/newsroom/id/2809117

What is Data Lake in Big Data?

Officially, data lake is one method of storing data within a system in its natural format which may facilitate the collocation of data in various structural forms. Forms can be object blobs or files. That raw data needs to be transformed data real to be used for various tasks like various toes of analytics, reporting, visualization, machine learning and so on. Data lake may include structured data from relational databases, semi-structured data including CSV, logs, XML, JSON or unstructured data like emails, PDFs or binary data like images, audio, video etc.

There is another terminology – data warehouse. Data lake is actually same as a data warehouse but in real life usage, they are shaped for different purposes, and analogous to swimming pool, lake etc. Data. There are matters like Storage, Agility, Security, Processing which are not same for data lake and data warehouse. There is really no complex matter with the terminology Data Lake.

Tagged With gartner newsroom id 2809117 , paperuri:(3d576e62c8b52946b5c48366c300b8c8) , showxcv , zeroq41

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Take The Conversation Further ...

Get new posts by email:

What is Data Lake in Big Data?

About Abhishek Ghosh

Here’s what we’ve got for you which might like :

Articles Related to What is Data Lake in Big Data?

Take The Conversation Further ...

Get new posts by email: