The role of OLAP data warehouse is very important in today’s Big data analytics era.
But, comparing to last decade, the usage of volume of data in analytics is increasing. The AI & ML are very hungry and it needs tera bytes of data with complex query.
The modern BI tools like Qliksense, PowerBI, Tableau can talk to the large volume data warehouse. But if we think about low latency in very large volume of data, something needs to be changed in the architecture.
Change in Architecture
The first generation platforms architecture can be used in the normal big data analytics.
In this architecture as shown, the BI tools can directly talk to Data warehouses to do any analysis.
But if you think about, the implementation of AI and ML on this we’ve to think about the unstructured data.
The two-tier architecture using both Data lake and Data warehouses. This will solve the unstructured data problems for doing any AI and ML activities.
In the first generation architecture, there are no option to handle unstructured data. But the Data lake can be used to store both structured and unstructured data.
Still, the two tier architecture won’t solve the very large amount of dataset problems.
This Lakehouse architecture will solve bot unstructured data problems and very large volume of datasets.
In Lakehouse architecture the ETL process is continuous. Due to the Data lake inside the architecture, we can access the raw data at any point of time. Also ETL is active, so that, the BI, AI & ML applications will get the near real time or real time data to its storage.
These kinds of architectural changes will reduce the cost of data storage as well as improving the latency (low) of the data aggregation and increase the productivity of data analysis.