Databases Azure Cosmos DB verified

Physical partition

A physical partition is the hidden slice of Azure Cosmos DB infrastructure that stores data and serves request units for a container. You do not create or move physical partitions yourself. Azure Cosmos DB creates and splits them as storage or throughput grows. What you control is the partition key and throughput model that determine whether work spreads evenly. When people talk about hot partitions, RU pressure, or uneven storage, they are usually seeing a physical-partition effect caused by logical partition choices.

Back to glossary browser Open Microsoft Learn source

Aliases: Cosmos DB physical partition, physical partition in Cosmos DB, partition key range
Difficulty: intermediate
CLI mappings: 7
Last verified: 2026-05-17

Microsoft Learn

An Azure Cosmos DB physical partition is the internal storage and throughput unit that hosts one or more logical partitions. Azure Cosmos DB manages physical partitions automatically, splitting them as throughput or data grows, while operators focus on partition-key design, RU distribution, hot partitions, monitoring, and capacity planning.

Microsoft Learn: Partitioning and horizontal scaling - Azure Cosmos DB2026-05-17

Technical context

In Azure architecture, a physical partition belongs to the Cosmos DB data plane. Containers hold items grouped by logical partition key values, and Azure Cosmos DB maps those logical partitions onto physical partitions. Each physical partition has storage and RU limits, replica sets, and regional placement handled by the service. Provisioned throughput is divided across physical partitions, so an uneven partition key can strand capacity. Operators monitor normalized RU consumption, storage growth, partition key distribution, and throttling rather than directly administering the physical partition objects.

Why it matters

Physical partition matters because it explains why a Cosmos DB container can have plenty of total RU/s yet still throttle specific requests. If one logical partition key receives too much traffic, the physical partition hosting it can become hot while other partitions are quiet. That affects latency, retry volume, cost efficiency, and user experience. It also shapes long-term design decisions: partition keys, synthetic keys, hierarchical partition keys, throughput scaling, and data retention. Teams that understand physical partitions avoid treating every 429 response as a simple capacity shortage. They look for distribution problems first, then decide whether to redesign keys, scale throughput, or split workload patterns.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Azure Monitor metrics, normalized RU consumption by partition key range shows one range near saturation while the container still has unused aggregate throughput available.

Signal 02

In application diagnostics, repeated 429 throttling and high retry counts affect requests for a small set of partition key values during traffic spikes or batch jobs.

Signal 03

In Cosmos DB design reviews, architects compare container throughput, logical partition cardinality, storage growth, retry patterns, and hot-key risk before approving a partition key path.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Diagnosing hot partitions in high-volume Azure Cosmos DB containers.
Reviewing partition-key candidates before launching a new multi-tenant workload.
Explaining why provisioned RU/s does not always translate into usable throughput.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Fleet telemetry hot-key redesign

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

RoadPulse operated a logistics telemetry platform that stored location events for delivery vehicles in Azure Cosmos DB. During morning route starts, a few depot identifiers created severe throttling even though the container showed unused total RU/s.

Business/Technical Objectives

Reduce 429 throttling during morning bursts by at least 70 percent.
Keep ingestion latency under two seconds for active vehicle events.
Avoid a permanent RU/s increase unless distribution evidence justified it.
Create an operator playbook for future partition-key reviews.

Solution Using Physical partition

Engineers used Azure CLI to inventory the Cosmos DB account, container, regions, and throughput settings, then linked the container resource ID to Azure Monitor metrics for normalized RU consumption. The evidence showed one partition key range near saturation while other ranges were quiet. Instead of simply increasing provisioned throughput, the team changed new telemetry writes to use a synthetic key combining depot, vehicle group, and hour bucket. Historical data stayed readable through a compatibility layer, while new writes landed in a more balanced pattern. Operators updated dashboards to show retry counts, request charge, and normalized RU by partition key range before any scale request.

Results & Business Impact

Morning 429 throttling fell 82 percent within two weeks of redirecting new telemetry writes.
P95 ingestion latency dropped from 6.8 seconds to 1.7 seconds during route-start bursts.
The team avoided a planned 40 percent permanent RU/s increase after distribution improved.
The new review playbook shortened partition-related incident triage from hours to minutes.

Key Takeaway for Glossary Readers

Physical-partition evidence helps teams fix Cosmos DB distribution problems instead of buying capacity around a hot key.

Case study 02

Gaming profile scale-out without leaderboard stalls

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

PixelForge ran a seasonal multiplayer game with player profiles, match summaries, and leaderboard snapshots in Cosmos DB. Launch-day traffic concentrated around popular regions and caused profile saves to retry heavily.

Business/Technical Objectives

Keep profile save latency below 150 milliseconds during season launches.
Separate leaderboard fan-out reads from player profile writes.
Limit autoscale peaks caused by uneven request distribution.
Give support engineers a clear signal for partition-related incidents.

Solution Using Physical partition

The platform team reviewed logical partition choices against physical-partition behavior. CLI inventory captured the account, database, containers, throughput mode, and tags for each game environment. Metrics showed that leaderboard refreshes and profile writes competed inside the same hot partition key ranges. The team split the workload into separate containers: player profiles keyed by player ID and leaderboard snapshots keyed by season plus shard. Application code routed writes through the profile container and moved high-fan-out reads to cached leaderboard documents. Dashboards compared normalized RU consumption, P95 latency, and retry volume before and after the change.

Results & Business Impact

Profile save P95 latency stayed under 120 milliseconds during the next season launch.
Autoscale peaks fell 28 percent because leaderboard reads no longer amplified hot ranges.
Support engineers could identify partition pressure from one workbook instead of searching logs.
The design supported two additional regions without changing the profile write path.

Key Takeaway for Glossary Readers

Physical partitions are invisible, but their metrics can reveal when one data model is forcing unrelated traffic through the same bottleneck.

Case study 03

Donation analytics workload split

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HarborAid used Cosmos DB to store donation events and campaign summaries for disaster-response fundraising. Large televised campaigns created a few dominant campaign IDs that slowed both donation writes and analyst queries.

Business/Technical Objectives

Protect donation capture during campaign surges.
Keep analyst dashboards refreshed without starving write throughput.
Reduce emergency capacity changes during fundraising events.
Document partition-key decisions for auditors and future volunteers.

Solution Using Physical partition

Operators used CLI commands to export Cosmos DB resource IDs, throughput settings, and tags, then correlated them with normalized RU metrics and application retry logs. The findings showed campaign ID was acceptable for reporting documents but too skewed for raw donation events. The team kept campaign summaries in the original container and moved incoming donation events to a new container keyed by donor-region plus generated event shard. Azure Functions wrote summary updates asynchronously, and dashboards tracked lag, request charge, and hot-range utilization. The runbook required partition evidence before any emergency throughput increase.

Results & Business Impact

Donation write failures during campaign peaks dropped from 3.4 percent to less than 0.2 percent.
Dashboard refreshes completed within five minutes without blocking event ingestion.
Emergency RU/s increases fell by 55 percent across the next two fundraising drives.
Audit notes clearly connected partition-key choices to continuity and donor-data protection.

Key Takeaway for Glossary Readers

Understanding physical partitions helps teams separate operational workloads before a reporting pattern damages transactional reliability.

Why use Azure CLI for this?

Azure CLI cannot move or resize individual physical partitions, because Azure Cosmos DB manages them internally. It is still valuable for proving the account, container, throughput, region, and monitoring scope before deeper metric or query analysis. CLI output also gives repeatable evidence for design reviews and incident records.

CLI use cases

Inventory Cosmos DB accounts, databases, and containers before a partition distribution review.
Show container throughput settings to compare provisioned RU/s with observed hot-partition behavior.
Export account and container resource IDs for Azure Monitor normalized RU metric queries.
Update container throughput only after confirming the issue is capacity, not a skewed partition key.

Before you run CLI

Confirm tenant, subscription, resource group, Cosmos DB account, database, container, API type, and region before collecting evidence.
Use read-only permissions for inventory and metrics, and separate those duties from account-key or throughput-change rights.
Review cost risk before increasing throughput, especially for autoscale containers, multi-region accounts, and high-retention workloads.
Use JSON output for evidence capture and avoid printing account keys, customer identifiers, or sensitive partition key values.

What output tells you

Account and container IDs identify the exact monitoring target for normalized RU and partition key range metrics.
Throughput values show whether manual or autoscale capacity is available before attributing throttling to service limits.
Locations and failover settings explain where replicas serve traffic and where regional pressure may appear.
Tags, database names, and container names tie partition investigations to owners, workloads, environments, and cost centers.

Mapped Azure CLI commands

Azure Cosmos DB operations

direct

az cosmosdb list --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb show --name <account-name> --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb create --name <account-name> --resource-group <resource-group>

az cosmosdbprovisionDatabases

az cosmosdb sql database list --account-name <account-name> --resource-group <resource-group>

az cosmosdb sql databasediscoverDatabases

az cosmosdb delete --name <account-name> --resource-group <resource-group>

az cosmosdbremoveDatabases

Cosmos operations

direct

az cosmosdb show --name <account> --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb create --name <account> --resource-group <resource-group> --locations regionName=<region>

az cosmosdbprovisionDatabases

az cosmosdb update --name <account> --resource-group <resource-group> --enable-automatic-failover true

az cosmosdbconfigureDatabases

az cosmosdb keys list --name <account> --resource-group <resource-group>

az cosmosdb keysdiscoverDatabases

az cosmosdb sql database list --account-name <account> --resource-group <resource-group>

az cosmosdb sql databasediscoverDatabases

az cosmosdb sql database create --account-name <account> --resource-group <resource-group> --name <database>

az cosmosdb sql databaseprovisionDatabases

az cosmosdb sql container list --account-name <account> --resource-group <resource-group> --database-name <database>

az cosmosdb sql containerdiscoverDatabases

az cosmosdb sql container create --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --partition-key-path <path>

az cosmosdb sql containerprovisionDatabases

az cosmosdb sql container throughput show --account-name <account> --resource-group <resource-group> --database-name <database> --name <container>

az cosmosdb sql container throughputdiscoverDatabases

az cosmosdb sql container throughput update --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --throughput <ru>

az cosmosdb sql container throughputconfigureDatabases

Architecture context

Physical partition in Cosmos DB architecture is a service-managed storage and throughput unit underneath a container. Application teams choose a logical partition key, but Cosmos DB maps those logical partitions onto physical partitions based on storage growth and provisioned throughput. That separation is important: you can see hot logical keys, RU pressure, or uneven distribution long before you can directly manage physical partition placement. Architects use the concept to reason about RU ceilings, cross-partition query cost, storage growth, regional replication, and split behavior. The right design starts with a partition key that spreads writes and reads naturally, then verifies with metrics, diagnostics, and query patterns instead of assuming the service will hide every data-model problem.

Security

Security impact is indirect because users do not assign permissions to physical partitions. Access is still controlled through Cosmos DB account keys, Microsoft Entra authentication, role definitions, managed identities, private endpoints, firewall rules, and data-plane permissions. Risk appears when troubleshooting partition pressure encourages operators to export sensitive items, broaden account access, or run diagnostic queries with excessive privileges. Hot-partition analysis can also reveal tenant IDs, customer identifiers, or operational patterns embedded in partition keys. Operators should limit who can inspect account keys, query data, read metrics, and change throughput. Keep diagnostic exports sanitized, protect partition-key values, and use least-privilege access for monitoring workbooks and automation.

Cost

Cost impact is direct because inefficient physical-partition distribution wastes provisioned throughput. A container may be scaled to a high RU/s level only because one physical partition is hot, while other partitions sit underused. That raises steady-state spend, autoscale peaks, and incident effort without solving the design issue. Storage growth can also force additional partitions, affecting minimum throughput behavior and long-term capacity planning. Teams should compare total RU consumption, normalized RU by key range, request volume, retry cost, and data retention before buying more capacity. Good partition design converts paid throughput into useful work instead of paying for idle headroom around a hot key.

Reliability

Reliability impact is direct because physical partitions are where Cosmos DB serves data, replicas, and throughput. Azure manages splits without application downtime, but bad distribution can still create localized throttling, high retry rates, and uneven latency. Reliable designs choose high-cardinality partition keys, test representative workloads, monitor normalized RU consumption by partition key range, and avoid single-tenant or time-bucket hot spots. During scale changes, teams should watch 429 responses, server-side latency, and storage growth before assuming the platform is unhealthy. Multi-region accounts still depend on balanced logical partition traffic, because every region must serve the same skewed access pattern under failover or burst conditions.

Performance

Performance impact is direct. Physical partitions determine how Cosmos DB distributes RU/s, storage, and request routing below a container. Balanced traffic can achieve high throughput with predictable latency; skewed traffic creates hot partitions, 429 responses, retries, and slow user-facing operations. Query patterns matter too: cross-partition queries, fan-out scans, and poor filters consume more RU/s and can make partition pressure worse. Operators should test partition-key candidates with realistic cardinality, payload sizes, and burst patterns. Monitoring P50, P95, normalized RU consumption, retry counts, and request charge helps separate application inefficiency from physical-partition imbalance before costly scale changes. Include burst tests so capacity changes are based on real traffic evidence.

Operations

Operators inspect physical-partition behavior through Cosmos DB metrics, container throughput settings, partition-key design reviews, query diagnostics, and application retry telemetry. Azure CLI helps inventory accounts, databases, containers, regions, throughput, and resource IDs for monitoring queries. Day-to-day work includes checking normalized RU consumption, identifying hot partition key values, validating autoscale or manual throughput, documenting partition-key rationale, and reviewing storage growth against retention plans. Operators do not rebalance physical partitions manually, so runbooks should focus on evidence gathering, workload shaping, throughput changes, synthetic key options, and escalation paths when service-managed splits or merge previews are relevant. Keep owner runbooks aligned with metric workbooks, alert rules, and design reviews.

Common mistakes

Assuming more total RU/s will fix throttling when one hot logical partition is the real bottleneck.
Choosing tenant ID or date as a partition key without testing tenant size, burst traffic, or retention growth.
Trying to manually rebalance physical partitions instead of redesigning logical partition distribution and workload shape.
Ignoring normalized RU metrics and looking only at average container-level throughput during incidents.

Operator quick checks

Are 429 responses concentrated around a small set of partition key values?
Does normalized RU consumption show one key range near 100 percent while others are low?
Will the partition key still distribute traffic after the largest tenant or day grows?
Is the team scaling throughput because of proven capacity pressure or because of skew?

Questions to ask

Which logical partition keys create the busiest physical-partition pressure today?
What breaks if the workload fails over with the same hot key pattern?
Who can change throughput, and who can approve partition-key redesign?
What metrics prove whether a scale change improved distribution or only increased spend?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph