Databases Cosmos DB verified

Partition key

A partition key is the value Cosmos DB uses to decide where each item belongs. For example, an order document might use tenantId, customerId, deviceId, or region as its partition key. The choice sounds small, but it controls how data spreads, how point reads work, and whether traffic concentrates in one hot partition. A good partition key matches the way the application writes and reads data. A poor one can be painful because the key cannot be changed directly on the same container.

Back to glossary browser Open Microsoft Learn source

Aliases: Cosmos DB partition key, logical partition key, partition key path, partition key value, container partition key
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-17

Microsoft Learn

In Azure Cosmos DB, a partition key is the item property and path used to distribute data and request units across logical and physical partitions. Good keys spread storage and throughput evenly, keep common queries efficient, and cannot be changed in place after container creation.

Microsoft Learn: Partitioning and horizontal scaling in Azure Cosmos DB2026-05-17

Technical context

In Azure architecture, a Cosmos DB partition key sits in the data plane of a container. The partition key path is configured when the container is created, and each item supplies a partition key value. Cosmos DB maps those logical partition values onto physical partitions that Microsoft manages. Queries, point reads, stored procedures, throughput consumption, diagnostics, and scaling behavior are all shaped by that mapping. The design connects application data modeling with RU/s capacity, indexing, consistency, and regional distribution decisions.

Why it matters

A partition key matters because it turns an abstract data model into production capacity behavior. When the key spreads writes and reads evenly, Cosmos DB can use provisioned or serverless throughput efficiently and serve point reads quickly. When the key is skewed, a few logical partitions can become hot while other capacity sits unused, causing throttling, higher latency, and expensive overprovisioning. The key also affects query shape, transaction boundaries, migration effort, and multi-tenant design. Teams that choose it casually often discover the problem only after traffic grows, when changing the design requires data movement, application changes, and careful release planning.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal Cosmos DB container settings, the partition key path appears beside indexing, throughput, unique key, and scaling details for the container. and reviews. reviews.

Signal 02

In ARM, Bicep, Terraform, or Azure CLI output, the container resource includes a partitionKey object showing path, version, and related configuration. during release drift reviews. during deployment and migration reviews.

Signal 03

In diagnostic logs and Cosmos DB workbooks, partition key values or ranges appear when teams investigate RU consumption, hot partitions, throttling, and query fan-out. during incidents. and release reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Design a Cosmos DB container for tenant, customer, device, order, or session data that must scale horizontally.
Target point reads and queries by passing both item ID and partition key value from application code.
Investigate hot partitions by reviewing diagnostic logs, request unit consumption, throttling, and top partition key values.
Plan migrations when an existing partition key no longer supports growth, tenant distribution, or query patterns.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Cold-chain telemetry distribution

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

FrostLine Logistics tracked temperature, humidity, and door events for refrigerated containers moving through ports and rail yards. The first container used region as the partition key, and summer traffic overloaded the busiest coastal region.

Business/Technical Objectives

Reduce throttling during high-volume telemetry bursts.
Keep point reads fast for individual container history.
Avoid buying excessive RU/s for quiet regions.
Create evidence for future data-model reviews.

Solution Using Partition key

Engineers rebuilt the telemetry container with containerId as the partition key and kept event time as an indexed field for range queries. A temporary copy pipeline moved recent records into the new container while the application wrote to both locations during cutover. Azure CLI confirmed the new container partition key path, throughput, and diagnostic settings before traffic switched. Log Analytics queries then tracked RU consumption by partition key value and showed whether any container produced unusually hot writes. The team kept regional dashboard filters, but those filters no longer controlled data placement. Runbooks documented why containerId was chosen and how to test future fleet-growth assumptions.

Results & Business Impact

429 throttling during peak telemetry windows dropped by 81%.
Average point-read latency for container history improved from 42 ms to 17 ms.
Provisioned RU/s was reduced by 28% after regional overprovisioning was removed.
Design review time for new containers fell from three days to one checklist session.

Key Takeaway for Glossary Readers

A Cosmos DB partition key should match the workload’s real write and read distribution, not just a convenient reporting label.

Case study 02

Legal workflow tenant scaling

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Verdant Docket hosted case-management data for law firms of very different sizes. A tenantId partition key worked for small firms, but one national client generated far more activity than every other customer.

Business/Technical Objectives

Reduce hot-tenant throttling without weakening tenant isolation.
Preserve efficient reads for case timelines and document metadata.
Move production data with a controlled cutover window.
Give support teams clear telemetry for tenant-specific incidents.

Solution Using Partition key

Architects introduced a synthetic partition key combining tenantId and caseGroupId for the case-events container. Authorization still checked tenant identity in the application layer, so the partition key was not treated as a security control. A migration job copied active cases into the new container, and SDK calls were updated to send the synthetic key for point reads and timeline queries. Azure CLI exported container settings before and after the migration, while diagnostic logs measured RU consumption by new key values. The largest tenant was spread across many logical partitions, but support dashboards still grouped incidents by tenant and case group.

Results & Business Impact

P99 write latency for the largest client fell by 46%.
Tenant-specific 429 incidents decreased from daily to fewer than two per month.
The migration completed with 22 minutes of read-only application mode.
Support could identify hot case groups from Log Analytics instead of escalating blind tickets.

Key Takeaway for Glossary Readers

Partition key design is part of SaaS scale architecture; tenant isolation still needs authorization, but capacity needs balanced key values.

Case study 03

Interactive game economy events

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Nebula Forge released a limited-time game economy event with millions of inventory updates. Earlier launches partitioned by eventName, which caused a single hot logical partition during popular events.

Business/Technical Objectives

Sustain write bursts during event launch hours.
Support fast lookup of a player’s recent inventory changes.
Limit serverless query cost for economy analysts.
Validate the design before a worldwide release.

Solution Using Partition key

The data team created a new Cosmos DB container using playerId as the partition key and stored eventName, itemType, and timestamp as indexed properties. Load tests replayed expected launch traffic with realistic player distribution. Azure CLI created the staging container, showed the partition key path, and exported throughput settings for release approval. Application code used playerId for point reads and targeted queries, while analytics jobs summarized event performance through scheduled aggregate pipelines instead of scanning every partition repeatedly. During launch, diagnostic logs watched RU usage by partition key value and alerted if a small group of players or bots created abnormal pressure.

Results & Business Impact

The launch processed 3.7 million inventory updates in the first hour without sustained throttling.
Player inventory lookup latency stayed below the 120 ms service objective.
Analyst query cost dropped by 31% after aggregate pipelines replaced repeated broad scans.
Bot-related hot-key alerts identified two abusive accounts within 15 minutes.

Key Takeaway for Glossary Readers

A partition key can protect both customer experience and platform cost when game traffic arrives in sharp, uneven bursts.

Why use Azure CLI for this?

Azure CLI is useful for partition keys because the portal view is not enough during design reviews, incidents, or drift checks. CLI commands can show the exact container partition key path, throughput settings, indexes, and diagnostic configuration across environments. They also make it easier to capture evidence before migration or capacity changes.

CLI use cases

Show a Cosmos DB SQL container and confirm the configured partition key path, indexing policy, and resource identifiers.
Create a development container with the intended partition key path before load testing real query and write patterns.
Query Log Analytics for partition key RU consumption, throttling, and hot partition evidence during performance reviews.
Compare container settings across dev, test, and production to detect drift before releasing application code.

Before you run CLI

Confirm tenant, subscription, resource group, account, database, container, region, and output format before running inventory commands.
Use read-only commands first; creating a container with the wrong partition key may require rebuilding or migrating data later.
Check whether the identity has Cosmos DB control-plane permissions and access to any Log Analytics workspace used for diagnostics.
Verify provider registration, autoscale or provisioned throughput mode, and expected cost impact before creating test containers.

What output tells you

The partitionKey path shows which item property Cosmos DB uses to route writes, point reads, and targeted queries.
Throughput and indexing fields explain whether hot partitions are likely capacity, query, or data-model problems.
Diagnostic and Log Analytics output shows which partition key values consume the most RU/s or experience throttling.
Resource IDs and regions help confirm that evidence came from the intended account, database, container, and environment.

Mapped Azure CLI commands

Cosmos DB partition key inspection commands

manage

az cosmosdb sql container show --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --query "{id:id,partitionKey:partitionKey,indexingPolicy:indexingPolicy}" --output json

az cosmosdb sql containerdiscoverDatabases

az cosmosdb sql container create --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --partition-key-path /<property> --throughput <ru>

az cosmosdb sql containerprovisionDatabases

az cosmosdb sql container throughput show --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --output table

az cosmosdb sql container throughputdiscoverDatabases

az monitor diagnostic-settings list --resource <cosmos-db-account-resource-id> --output table

az monitor diagnostic-settingsdiscoverDatabases

az monitor log-analytics query --workspace <workspace-id> --analytics-query "CDBPartitionKeyRUConsumption | summarize totalRU=sum(RequestCharge) by PartitionKey" --output table

az monitor log-analyticsdiscoverDatabases

Architecture context

Security

Security impact is indirect because a partition key is not an authorization boundary. Cosmos DB access is still controlled through keys, Microsoft Entra integration, roles, network rules, private endpoints, and application logic. Risk appears when teams treat a tenant-based partition key as proof of tenant isolation without enforcing authorization in code and queries. Partition key values may also include customer identifiers that appear in diagnostics, logs, traces, or support exports. Operators should avoid embedding sensitive secrets in keys, restrict telemetry access, and review query filters so one tenant cannot request another tenant's partition value through a weak API path. Review telemetry access during incidents and support exports. Always.

Cost

Cost impact is direct because Cosmos DB charges are tied to requests, storage, and capacity choices that partitioning strongly influences. A balanced key helps use RU/s across physical partitions, while a hot key can force teams to buy more throughput than the workload should need. Cross-partition queries can scan more data and consume more request units than targeted point reads. Redesigning a poor key can also create migration cost through duplicate containers, copy jobs, testing, and temporary parallel writes. FinOps reviews should connect Cosmos DB spend with top partition key values, throttling evidence, autoscale ceilings, storage growth, and query patterns.

Reliability

Reliability impact is direct because a bad partition key can create a concentrated failure pattern. Hot logical partitions can produce rate limiting, timeouts, retry storms, and uneven latency even when the account has enough total RU/s. A partition key that does not match write volume can also make failover recovery and catch-up work noisier after regional events. Reliable designs test realistic cardinality, growth, tenant distribution, and query patterns before container creation. Runbooks should include hot-partition detection, retry backoff review, and migration options such as container copy jobs or redesign into a new container when the original key blocks safe scaling.

Performance

Performance impact is direct because partition keys shape routing, fan-out, and throttling behavior. Point reads are fastest when the application supplies both item ID and partition key value. Queries that include the partition key can target fewer logical partitions; queries without it may fan out across more data and cost more RU/s. Hot partition values raise latency for users tied to those values even when global metrics look healthy. Good performance testing uses realistic key distribution, write bursts, tenant sizes, and query filters. Operators should compare P50, P95, and P99 latency with RU consumption by partition key, not only account-level averages.

Operations

Operators inspect partition keys in container definitions, ARM or Bicep templates, Azure CLI output, SDK configuration, and Cosmos DB diagnostic logs. Day-to-day work includes checking the partition key path, identifying high-RU partition key values, reviewing throttling, comparing query filters with the key, and documenting why a container was designed that way. Changes are not simple updates; most corrections require a new container, data copy, application cutover, or secondary indexing strategy. Teams should keep evidence of key cardinality, top tenants, request distribution, and query patterns so future engineers understand the tradeoff behind the container. Keep saved examples for each environment and owner. Keep review notes current for every production container.

Common mistakes

Choosing a low-cardinality key, such as status or country, for a workload that writes heavily to a few values.
Assuming a tenant ID partition key is a security boundary instead of enforcing authorization in application logic.
Creating a production container before testing realistic distribution, write bursts, and query filters.
Trying to fix an immutable partition key with a simple update instead of planning a safe migration path.

Operator quick checks

Does the key have enough cardinality for expected tenants, devices, orders, or sessions?
Do the most common reads and writes include the partition key value without awkward application lookups?
Are diagnostics enabled so hot partition and RU consumption evidence can be reviewed later?
What is the migration plan if the chosen key fails under real growth?

Questions to ask

What workload boundary does this partition key represent: tenant, device, customer, order, session, or time bucket?
Who can create containers, and who reviews partition key choices before production deployment?
What breaks if a few partition key values become much larger or busier than expected?
What metrics, logs, rollback path, or migration plan proves the design remains healthy after launch?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learning paths

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph