Database, Datawarehouse, Datamart, Data Lake. Major differences?

Bildergebnis für datawarehouse

With the boom of Big Data, storage techniques have evolved considerably. From a data sheet to the data lake, storage and processing methods have a crucial impact on IT business.

While the information systems of the 1980s were business-oriented and related to ERPs, storage methods were generally centralized data warehouses.

Data warehouses: centralized and structured systems

Data warehouses are centralized systems that store structured data. These are large relational databases that contain all the data of a company. Their ultimate goal is Business Intelligence or dashboarding, or OLAP analyses or report-generating. The database is used to mine the data, to drill up and down.

Data warehouses feed from different sources, such as ERP data or internet servers or other multiple sources. Thanks to ETL (extract, transform, load) programs, the data is extracted from the sources, transformed, cleaned and adapted to the codes fixed by the metadata of the warehouses.

A datamart: smaller in size and more practical

A datamart is a subset of a data warehouse and includes aggregated and frequently used data. There are marketing, HR or logistics datamarts.

Currently, with the data deluge, data warehouses no longer suffice to different formats and volumes of the data collected. A new storage technique is used: data lakes.

Data lakes: centralized and hybrid systems

Here, data warehouses are always centralized, but they contain structured data (ERP), semi-structured data (html, xml …) and unstructured data (audio, video, pdf …). And the processing power is also multiplied to cope with the masses of data collected in real time from different sources. Hadoop is the world leader.

No Big Data were possible to handle without data lakes. The ultimate goal is the detecting of trends and patterns with the use of powerful algorithmic analyses.