Microsoft Azure Data Fundamentals: Core Data Concepts

Microsoft Azure Data Fundamentals: Core Data Concepts

Core Data Concepts

Section Overview: In this section, we will explore the core data concepts. We will identify how data is defined and stored, describe and differentiate different types of data workloads, and distinguish batch and streaming data.

Types of Data

  • Data is a collection of facts used in decision making.

  • Structured data is tabular data represented by rows and columns in a database.

  • Semi-structured data doesn't reside in a relational database but still has some structure to it.

  • Unstructured data refers to audio and video files or binary data files that might not have a specific structure.

Types of Semi-Structured Data

  • Key-value stores are similar to relational tables except that each row can have any number of columns.

  • Graph databases store and query information about complex relationships using nodes (information about objects) and edges (information about the relationship between objects).

Types of Data Workloads

  • Transactional systems record transactions such as financial movements or tracking payments for goods/services from customers. They are often high volume, handling many million transactions in a single day.

  • Analytical systems capture raw data to generate insights for business decisions.

Batch vs Streaming Data

  • Batch processing involves processing large amounts of historical or static data at once.

  • Streaming processing involves processing real-time or near-real-time continuous streams of incoming data.

Introduction to Data Processing

Section Overview: This section introduces the concept of data processing and explains the importance of storing, cleaning, and visualizing data. It also discusses the differences between batch and streaming data processing.

Storing Data

  • Environmental information, point of sale devices, financial data, and weather data are examples of data that need to be stored.

  • The repository for storing data can be a file store, document database or relational database. Azure SQL Database is one such storage platform.

Cleaning Data

  • Raw data may not be in a suitable format for querying. Cleaning operations may include filtering out anomalies or transforming the data.

  • Invalid or questionable data should be removed before visualization.

Visualizing Data

  • Tables or documents aren't always intuitive representations of data. Visualization tools like Power BI can provide rich graphical representation.

  • Charts like bar charts, line charts, plot results or geographical maps pie charts can illustrate how data changes over time.

Batch vs Streaming Processing

  • Batch processing involves collecting newly arriving elements into a group which is then processed at a future time as a batch.

  • Streaming processing processes each new piece of incoming data when it arrives in real-time.

  • Examples of batch processing include credit card billing while examples of streaming processing include tracking stock market changes in real-time.

Differences Between Batch and Streaming Processing

Data Scope

  • Batch processing can process all the available dataset while streaming typically only has access to the most recent received within a rolling time window.

Data Size

  • Batch processing is suitable for handling large datasets while streaming processing is intended for individual records or micro batches consisting of few records.

Stream Processing

Section Overview: In this section, the speaker explains how stream processing is used for simple response functions, aggregates or calculations such as rolling averages.

Stream Processing

  • Stream processing is used for simple response functions, aggregates or calculations such as rolling averages.

Reference

For more details follow the Microsoft Official Documentation: https://learn.microsoft.com/en-us/training/paths/azure-data-fundamentals-explore-core-data-concepts/