Article Single


What is Data Lake?

It will not have escaped anyone's notice that data is becoming the centre of all strategic thinking: in a constantly changing business context, its potential extrapolations become the object of all attention. Datawarehouse, Big Data and now, Data Lake: storing data and "making it talk" becomes an activity in its own right.

Does the Data Lake, a newcomer in the said discussions of data strategies concern all companies? How to implement it without falling into a gas factory where interpretation will be hazardous. At a time when, according to a KPMG-Forrester study from the end of 2016, less than 25% of managers have confidence in the effectiveness of the data used by their company, how can we think effectively about the subject?

The Data Lake: for whom?

When we realize that companies create nearly 2.5 billion GB of data every day, or 2.5 Exabytes, we wonder who the concept of Data Lake could not concern.

This data structure is very suitable for companies that decide to keep the history of their data, without knowing, at the moment, what to do with it. Marketing analysis techniques are constantly evolving: it is therefore not excluded to keep them "while waiting".

On the other hand, for a recurrent use of data (requiring systematic structural calculations), the concept of Data Lake is not at all suitable. The same holds true for companies that do not handle large volumes of data, if there is still some left.

Data Warehouse vs Data Lake

Data WarehouseDatawarehouse vs Datalake

Data Lake


The following table is another illustration of the difference between Data Lake and Data Warehouse:

Difference between Datawarehouse and Data Lake


Challenges of Data Lake 

Besides its numerous benefits and advantages, Using Data Lake has also its downsides and challenges. In Data Lake, the data volume is larger, so the mechanism must be more focused on programmatic administration. In addition, sparse, incomplete, volatile data is difficult to work with. Also, a broader variety of dataset and source requires more data governance and help.

By way of conclusion, organizations are looking for an easier, more tested solution to incorporating and delivering the advantages of a data lake. What is required is a strategy that can exploit all the data of a company without needing to rip and replace or restore their current investments to prevent retraining their workers. It is planned to harness the new ML technologies and infuse AI skills that result in smarter market outcomes. A network that can be wired in, up and running in hours or days, as opposed to weeks and months.