From the course: Azure Solutions Architect Expert (AZ-305) Cert Prep: Design Data Storage Solutions

Considerations for data integration

- Traditionally, when thinking about the analysis of data, one or more data sources were batch processed using extract transform load or ETL into an analytical data store, which was often a data warehouse. Reports could then be generated across the data in the analytical data store. This ETL method had a very large drawback. If in the future the questions being asked of the data changed, it was not possible to go back to the original raw data. This was traditionally rectified by extracting and loading data that may require further analysis in the future into a data store, and then batch processing and transforming the data into an analytical store for reporting. Thus allowing further transformations to occur on the data at a later date. This solved the problem of being able to go back and perform further transforms on data. But with the explosion of real time data, from sources such as social data, and internet of things and transactional data, the traditional processing path introduced to higher latency between data source and reporting. Meaning that questions being asked of the data, were often hours, days, or weeks out of date. The new real time requirements of data and real time views of data require new paths for the data with real time data ingestion and stream processing to supplement the reporting views. These paths are known as the cold path for views of the batch process data, and the hot path for real time data. The combination of hot path and cold path also enables training of machine learning models to aid the accuracy of predictions and reporting of the real time data. This is a Lambda architecture. To summarize the hot path is for latency sensitive data where a real time view is required. The cold path is for data batch processed at hourly, or daily intervals producing a batch view of the data. And there is also a warm path where data is aggregated at the edge by IOT devices and pushed to warm storage and consumed by reporting and analytics clients. As stated previously, this architecture is often known as a Lambda architecture. I will be referring to this architecture throughout this chapter.

Contents