From the course: Azure Solutions Architect Expert (AZ-305) Cert Prep: Design Data Storage Solutions

Understanding types of data

- [Instructor] Data is everywhere. It is present in most of our everyday interactions, hidden from view by the systems we are interacting with. As architects, we need to classify or categorize data based on its characteristics and usage, enabling us to design data interfacing and storage systems capable of meeting a use case. You may have heard the following phrases when data has been discussed in relation to a system or process but what is the difference between structured, semi-structured and unstructured and how do you classify them? Let's look at the data behind a sales system and how it has evolved to find out. Historically a sales system would've been a collection of flat files or tables listing entities such as products and orders. These were predefined data models organized or linked in a predefined way. This is known as structured data. The structure is defined upfront. This type of predefined static structure is stored in a relational data store. Each entity is linked by one or more relationships. Entities in this example are order and product. An order is linked to one or more products by the products ID. These data stores are run on relational database management systems or RDBMSs to transact with the data store multiple entities must be read or written to. With the advent of the internet, customer expectation changed. Product catalogs were now required to have items such as the images of the product, downloadable manuals and videos for maintenance and usage. This is an example of unstructured data. Each photo, document and video could be any number of formats and may bear no relation to each other in structure, their only link us to the product. Further to this manufacturers were moving faster than before, new product lines with unrelated features being added. Take the original product definition of ID, SKU, size and color and compare this to a new part. ID and SKU still exist but diameter and cat ID do not fit into the original structure. This is an example of semi-structured data. There are some key parts that are structurally similar but not identical and the structure can change rapidly. To meet customer demand, the relational data structure must reflect each of these changes. Additional columns are added for the new product features and further entities are added for the manuals, photos and videos. Now, this is a small example. Imagine this iterative process over five or 10 years. Following these types of changes, keeping searches for products fast and flexible would be a struggle with ever increasing columns on the product and gaps in the data. This huge increase in data from documents, photos and videos brought their own problems in performance and for relational data stores. Also the data layer in a monolithic web app was often tightly coupled, the addition of a column or entity in one area could affect other areas of the application, slowing development time. Even with advancements in relational data stores, keeping files out of the entities with pointers to the files stored elsewhere and flexible column types to deal with unknown features, issues around new requirements still appeared, how does this architecture cope with videos now being streamed by multiple worldwide users instead of downloads? The answer is it doesn't, not without issue. This is why architects need to understand different categories of data and data stores for those categories. To recap, structured data has a predefined organization through a data model which is known up front. Examples of structured data are order transactions or financial data. Semi-structured data has some structure but it cannot be stored in a relational format as it doesn't fit easily into static rows or columns as its structure is often changing. A product catalog is an example of semi-structured data. Unstructured data has an internal structure which defines the type of video, photo or document it is but it has no pre-defined data model governing the data that is stored. To determine which category the data in your solution fits into and the data store you should be using requires further understanding of the data's usage and characteristics. Let's dig into this further through the rest of the videos in this chapter.

Contents