Non-relational data, also known as unstructured or semi-structured data, is data that does not follow a predefined schema or a fixed table structure. Non-relational data can be stored in various formats, such as JSON, XML, CSV, binary, or text files. Non-relational data can also be hierarchical, nested, or graph-based, meaning that it can capture complex relationships and hierarchies among entities.
Non-relational data has some advantages over relational data, such as:
It can handle large volumes and varieties of data that may not fit in a relational database.
It can scale horizontally across multiple nodes or partitions, which improves performance and availability.
It can support flexible and dynamic schemas that can evolve with changing business requirements.
It can enable faster development and deployment of applications that use non-relational data.
However, non-relational data also has some challenges and trade-offs, such as:
It may not support ACID transactions (atomicity, consistency, isolation, durability), which guarantee data integrity and consistency in relational databases.
It may not support complex queries or joins across multiple collections or documents, which require additional processing or logic in the application layer.
It may not support referential integrity or constraints, which ensure that data is valid and consistent across tables in relational databases.
It may require different skills and tools to design, model, query, and manage non-relational data.
To work with non-relational data in Azure, you need to understand the different types of non-relational data stores and their use cases. Azure offers a variety of non-relational data services, such as:
Azure Cosmos DB: A fully managed, multi-model database service that supports key-value, document, graph, and columnar data models. Azure Cosmos DB provides global distribution, high availability, low latency, automatic scaling, and multiple consistency levels.
Azure Blob Storage: A scalable and durable object storage service that stores unstructured binary or text data as blobs (binary large objects). Azure Blob Storage supports hot, cool, and archive tiers for different performance and cost requirements.
Azure Data Lake Storage: A distributed file system that stores large volumes of structured and unstructured data in native formats. Azure Data Lake Storage supports the Hadoop Distributed File System (HDFS) protocol and integrates with Azure Data Lake Analytics and Azure HDInsight for big data processing and analytics.
Azure Table Storage: A key-value store that stores structured or semi-structured data as entities in tables. Azure Table Storage is ideal for storing large amounts of simple data that do not require complex queries or relationships.
Azure Queue Storage: A message queue service that enables asynchronous communication between applications or microservices. Azure Queue Storage allows sending and receiving messages up to 64 KB in size.
To prepare for the DP-900 exam, you should be familiar with the concepts and terminology related to non-relational data, such as:
Partitioning: The process of dividing a large dataset into smaller subsets based on a partition key or a hash function. Partitioning improves scalability and performance by distributing the workload across multiple nodes or servers.
Sharding: A type of partitioning that splits a dataset into horizontal subsets based on a shard key or a range of values. Sharding enables scaling out a database by adding more shards as the data grows.
Replication: The process of copying data from one location to another for redundancy or availability purposes. Replication can be synchronous or asynchronous depending on the consistency level required.
Consistency: The degree to which the data in a distributed system is synchronized and up-to-date across all nodes or replicas. Consistency can be strong (all nodes have the same view of the data at all times), eventual (all nodes will eventually have the same view of the data), or causal (all nodes will have the same view of the causally related data).
Indexing: The process of creating and maintaining an index structure that maps keys or attributes to values or locations in a dataset. Indexing improves query performance by reducing the number of scans or lookups required to find the desired data.
Schema: The definition or structure of the data in a dataset. A schema can be explicit (defined by the user) or implicit (inferred by the system) depending on the type of database. A schema can also be rigid (fixed and enforced) or flexible (dynamic and adaptable) depending on the type of data.