Running a globally distributed application means grappling with latency, consistency, and operational complexity all at once. Azure Cosmos DB was built specifically for this challenge, offering single-digit millisecond reads and writes at any scale, across any region. Unlike traditional databases that bolt on replication as an afterthought, Cosmos DB treats global distribution as a first-class primitive — you add regions with a single click and your data follows. Before you commit to it, though, you need to understand the five consistency levels, partition key selection, and the Request Unit billing model, because getting any of these wrong has real consequences at scale.
- Cosmos DB offers five tunable consistency levels ranging from Strong to Eventual — pick the weakest level your application can tolerate to minimize RU cost and latency.
- Partition key selection is the single most important design decision: a bad key creates hot partitions and caps your throughput.
- Each logical partition is capped at 20 GB — if your key does not distribute data evenly, you will hit this ceiling.
- Choose your API (NoSQL, MongoDB, Cassandra, Gremlin, or Table) before you build; migrating between APIs later is painful.
- Multi-region writes (multi-master) dramatically improve write latency globally but introduce conflict resolution complexity.
- The RU model makes costs predictable but requires careful capacity planning — serverless suits spiky, low-volume workloads, provisioned throughput suits steady high-volume ones.
What Is Azure Cosmos DB?
Azure Cosmos DB is a fully managed, globally distributed, multi-model NoSQL database service from Microsoft. At its core, it stores schema-agnostic JSON documents, but it exposes multiple API surfaces so that existing application code written against MongoDB, Apache Cassandra, or Apache Gremlin can connect without significant rewrites.
The architecture is built around a storage engine called the Atom-Record-Sequence (ARS) model, which stores data as a log-structured merge tree underneath every API. Above that engine, Cosmos DB layered a set of protocol-compatible API translators. This means a MongoDB client talking to Cosmos DB is actually talking to a MongoDB-wire-protocol adapter sitting in front of the same underlying engine — it is not running MongoDB itself.
Key capabilities at a glance:
- 99.999% availability SLA for multi-region accounts with multi-region writes enabled
- Automatic and transparent horizontal partitioning
- Five tunable consistency levels
- Change feed for event-driven architectures
- Integrated analytical store (HTAP) via Azure Synapse Link
- Serverless and provisioned throughput billing modes
Consistency Levels
Cosmos DB lets you choose your consistency level per-account (with the option to relax it per-request). This is one of the most misunderstood features. The five levels, from strongest to weakest, are:
| Level | Read Guarantee | Typical Latency Impact | Best For |
|---|---|---|---|
| Strong | Always reads the latest committed write, globally | Highest — must wait for quorum across regions | Financial ledgers, inventory where stale reads are unacceptable |
| Bounded Staleness | Reads lag writes by at most K versions or T seconds | High for multi-region reads | Globally distributed apps that need predictable lag bounds |
| Session | Read-your-own-writes within a session token | Low — local reads outside the write region | User-facing apps where a user must see their own changes immediately |
| Consistent Prefix | Reads never see out-of-order writes | Low | Social feeds, comment threads where ordering matters more than freshness |
| Eventual | No ordering guarantees — replicas converge eventually | Lowest | Leaderboards, analytics, non-critical counters |
Strong consistency is unavailable for multi-region write accounts. If you need Strong consistency with global writes, you must designate a single write region. This is a hard architectural constraint, not a configuration knob.
Session consistency is the default and covers the majority of web application use cases. A user creates a record and immediately reads it back — session tokens ensure they see their own write. Other users in the same region may see a slightly stale view, which is almost always acceptable.
Partitioning: Logical vs Physical
Cosmos DB partitions data automatically, but you choose the partition key. Understanding the two-tier partitioning model is essential for avoiding the hot partition problem.
Logical partitions are defined by unique values of your partition key. All documents sharing the same partition key value belong to the same logical partition. A logical partition can hold at most 20 GB of data. If any single partition key value accumulates more than 20 GB, writes to that partition will fail.
Physical partitions are the actual storage and compute nodes underneath. Cosmos DB maps logical partitions to physical partitions automatically and rebalances as your container grows. Each physical partition can handle up to 10,000 RU/s of throughput. When you provision 30,000 RU/s, you get at least three physical partitions.
The hot partition problem occurs when a disproportionate share of your requests routes to a single partition key value. If you store IoT telemetry keyed by deviceId and one device sends 10,000 events per second while others send 10, that one device's partition becomes a throughput bottleneck regardless of how much total RU/s you provision. Monitor per-partition RU consumption in Azure Monitor and alert on normalized RU consumption above 80%.
Good partition key candidates have high cardinality (thousands to millions of distinct values), distribute requests evenly, and appear in the majority of your query predicates. For multi-tenant SaaS, tenantId is often ideal. For e-commerce orders, customerId combined with a suffix (synthetic partition key) can spread load if a single customer generates extreme volume.
Use a synthetic partition key if no natural attribute distributes writes evenly. Concatenate two or more fields — for example userId + "_" + date — to create a higher-cardinality key. Cosmos DB SDK v3 includes a built-in helper for this pattern.
API Choices
Cosmos DB exposes six API surfaces. The choice you make at container creation time is permanent for that account, so get it right upfront.
- NoSQL API (formerly Core SQL) — The native API. Query with SQL-like syntax, full support for all Cosmos DB features including change feed, indexing policy control, and integrated functions. Use this for greenfield development.
- MongoDB API — Wire-protocol compatible with MongoDB 4.x and 5.x. Existing MongoDB drivers connect without code changes. Not all MongoDB operators are supported — check the compatibility matrix before migrating.
- Apache Cassandra API — CQL (Cassandra Query Language) compatibility. Useful for migrating Cassandra workloads to a managed service without rewriting application code. Secondary indexes and lightweight transactions have limitations versus native Cassandra.
- Apache Gremlin API — Property graph API using the Gremlin graph traversal language. Suitable for recommendation engines, fraud detection graphs, and knowledge graphs.
- Table API — Azure Table Storage-compatible API. Use this only when migrating existing Azure Table Storage workloads; for new development, prefer the NoSQL API.
- PostgreSQL API (Cosmos DB for PostgreSQL) — Powered by Citus, this is a horizontally sharded PostgreSQL cluster, not the same underlying ARS engine. It runs actual PostgreSQL with the Citus extension for distributed query execution. This is a fundamentally different product than the other Cosmos DB APIs and suits relational workloads that need horizontal scale-out — think multi-tenant SaaS with complex joins.
Cosmos DB for PostgreSQL (Citus) is an excellent choice if you want relational semantics with horizontal sharding but do not want to manage a Citus cluster yourself. It supports standard PostgreSQL tools, extensions, and connection pooling via PgBouncer out of the box.
Global Distribution and Multi-Region Writes
Adding a read region to a Cosmos DB account takes about 30 minutes and requires zero application code changes. The SDK automatically routes reads to the nearest region based on a preferred-region list you configure.
Multi-region writes (also called multi-master) allows any region to accept writes simultaneously. This reduces write latency for geographically distributed users — a user in Tokyo writes to the Tokyo region rather than waiting for a round trip to West US. The trade-off is conflict resolution. When two regions accept conflicting writes to the same document simultaneously, Cosmos DB must resolve the conflict using one of three strategies:
- Last-Writer-Wins (LWW) — Based on a system-managed or user-defined timestamp property. Simple and predictable, works for most use cases.
- Custom (Merge procedure) — A JavaScript stored procedure you write that receives both conflicting versions and outputs the resolved document. Maximum flexibility, requires development effort.
- Custom (Async) — Conflicting writes are logged to a conflict feed for your application to resolve asynchronously. Appropriate when conflict resolution requires business logic that cannot run inside the database.
The 99.999% availability SLA requires at least two write regions. For disaster recovery with a lower cost, a single write region with one or more read replicas achieves 99.99% availability with automatic failover.
Change Feed
Change feed is a persistent, ordered log of every insert and update to a Cosmos DB container (deletes are not included by default, though the full fidelity mode in preview adds soft-delete signals). Consumers read the change feed to trigger downstream processing — cache invalidation, event streaming to Azure Event Hubs, maintaining materialized views in a secondary container, or feeding Azure Functions for real-time processing.
Change feed is partitioned along the same logical partition boundaries as your data. Each partition's changes are ordered, but cross-partition ordering is not guaranteed. Design consumers to be idempotent.
Cost Model: RUs, Serverless, and Provisioned Throughput
Every database operation in Cosmos DB is expressed in Request Units (RUs). One RU equals the cost of a 1 KB point read by primary key. Writes cost approximately 5x more than reads for the same document size. Cross-partition queries, large documents, and queries without an index hit consume substantially more RUs.
| Mode | Billing | Best For | Limits |
|---|---|---|---|
| Provisioned Throughput | Per RU/s per hour (minimum 400 RU/s per container) | Steady, predictable workloads; production systems | Can autoscale between 10% and 100% of max RU/s |
| Serverless | Per RU consumed + storage | Dev/test, sporadic workloads, low-traffic APIs | 5,000 RU/s burst per container; single region only |
| Autoscale Provisioned | Per max RU/s configured per hour | Variable workloads with unpredictable peaks | Scales down to 10% of max automatically |
Use the Cosmos DB capacity calculator before provisioning. Run your queries in the data explorer and examine the RU charge reported in the response headers — this is the single most reliable way to right-size your provisioned throughput. Enable the "Query Stats" tab in the Azure Portal to identify expensive queries before they reach production.
Cost management tips:
- Share throughput at the database level (rather than per-container) when you have many small containers with variable traffic — up to 25 containers can share a database-level RU pool.
- Use the Time-to-Live (TTL) feature to automatically expire and delete stale documents rather than paying for storage and running manual cleanup jobs.
- Set index policy to exclude paths you never query — by default, Cosmos DB indexes every property, which costs RUs on every write.
- Switch to serverless for development and staging environments where throughput demands are low and unpredictable.
- Monitor the "Normalized RU Consumption" metric in Azure Monitor to detect whether you are over- or under-provisioned.
- Cosmos DB's five consistency levels (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual) let you trade consistency for latency and cost — Session is the right default for most applications.
- Partition key selection is the most impactful design decision. Choose a high-cardinality key that distributes both storage and requests evenly. Remember the 20 GB per logical partition hard cap.
- The hot partition problem is silent until it causes throttling. Monitor normalized RU consumption per partition from day one.
- Choose your API at account creation time. NoSQL API gives the richest feature set for greenfield projects. MongoDB API and Cassandra API ease migration of existing workloads.
- Cosmos DB for PostgreSQL (Citus) is a distinct, relational product — consider it when you need horizontal sharding with full SQL semantics rather than document or key-value access patterns.
- Multi-region writes improve write latency globally but require a conflict resolution strategy — Last-Writer-Wins covers most use cases.
- Use serverless for low-traffic or non-production workloads; use autoscale provisioned throughput for variable production workloads; use fixed provisioned throughput for steady, high-volume workloads.
- Tune your index policy, use TTL for expiry, and share database-level throughput across small containers to keep costs under control.
Find the Right Database for Your Architecture with JusDB
Azure Cosmos DB is a powerful choice for globally distributed workloads, but it is not the right fit for every project. Whether you are evaluating Cosmos DB against DynamoDB, deciding between the NoSQL API and Cosmos DB for PostgreSQL, or trying to understand what consistency level your SLA actually requires, JusDB can help you cut through the complexity.
JusDB provides structured guidance on database selection, architecture, and cost optimization — so you spend less time reading documentation and more time building. Explore our database comparison guides and use the JusDB tool to match your workload to the right database engine for your specific access patterns, latency requirements, and budget.