Databases Azure Cosmos DB premium premium field-manual-complete

Cosmos DB throughput

Cosmos DB throughput is the amount of database work your Cosmos DB workload can perform. Azure describes that work in request units, or RUs, so a point read, write, query, or stored procedure call consumes a measurable amount of capacity. You can provision RU/s, let autoscale adjust within a range, or use serverless billing for consumed RUs. For operators, throughput explains why requests are fast, throttled, expensive, or unevenly distributed across partitions. It is both a performance control and a cost control.

Aliases
RU/s, Cosmos DB throughput
Difficulty
intermediate
CLI mappings
5
Last verified
2026-06-02

Microsoft Learn

Microsoft Learn explains Cosmos DB throughput through request units, a normalized measure of CPU, IOPS, and memory needed for database operations. Workloads can use provisioned throughput, autoscale, or serverless capacity models, and operators monitor RU consumption, throttling, and partition distribution.

Microsoft Learn: Request Units in Azure Cosmos DB2026-06-02

Technical context

Technically, throughput is configured and observed at different Cosmos DB scopes, depending on API and capacity mode. For NoSQL workloads, it is commonly assigned to a container or shared at a database level, with autoscale and serverless alternatives. RU charges are affected by item size, indexing, consistency, query shape, partition key design, and regions. Throughput also maps onto physical partitions, so normalized RU consumption can show hot partitions even when total account capacity looks healthy. Metrics, SDK response headers, and CLI output all help operators inspect it.

Why it matters

Cosmos DB throughput matters because it is where architecture, performance, and monthly spend meet. Too little throughput creates HTTP 429 throttling, slow user experiences, delayed processors, and noisy incident channels. Too much throughput wastes money every hour, especially across multiple regions or idle containers. The right setting depends on partition design, traffic bursts, consistency, indexing, and whether the workload is steady or spiky. It also shapes migration planning: bulk imports, backfills, and new product launches often need temporary RU increases. For learners, throughput is the clearest way to understand that Cosmos DB is not just storage; every operation consumes capacity.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Cosmos DB portal scale settings, throughput appears as manual RU/s, autoscale maximum RU/s, or serverless capacity behavior for databases and containers.

Signal 02

In Azure Monitor metrics, normalized RU consumption, total request units, throttled requests, and partition key range statistics reveal whether capacity is healthy or hot during operations.

Signal 03

In SDK response headers and diagnostics, request charge shows how many RUs a specific read, write, or query consumed during application execution for tuning reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Choose container-level throughput for a critical workload that must not be slowed by noisy neighboring containers.
  • Use autoscale for predictable daily traffic spikes where manual RU changes create operational risk.
  • Investigate HTTP 429 throttling by correlating normalized RU consumption, hot partitions, and inefficient queries.
  • Temporarily raise throughput for a bulk import, backfill, or launch event, then lower it on schedule.
  • Compare serverless and provisioned throughput for a new workload with uncertain or intermittent traffic.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Game leaderboard launch surge

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A multiplayer gaming studio launched a seasonal leaderboard that wrote scores every few seconds and served global ranking queries to mobile players.

Business/Technical Objectives
  • Keep score-submission latency under 100 milliseconds during launch night
  • Prevent hot partitions from throttling the most active regions
  • Control RU spend after the first-week traffic spike
  • Give support teams clear evidence when players reported delayed ranks
Solution Using Cosmos DB throughput

Architects reviewed access patterns and moved the leaderboard container to dedicated autoscale throughput instead of shared database throughput. The partition key was adjusted for tournament and region distribution, and the SDK logged request charge and retry diagnostics for high-volume paths. Operators used Azure CLI to capture the starting throughput setting, raise the autoscale maximum for launch week, and schedule a post-event review. Azure Monitor alerts watched normalized RU consumption and 429 responses by partition key range so the team could distinguish real capacity pressure from inefficient ranking queries.

Results & Business Impact
  • P95 score-write latency stayed below 82 milliseconds during the launch peak
  • HTTP 429 responses remained under 0.3 percent for score submissions
  • Autoscale maximum was lowered after day seven, cutting projected monthly RU cost by 38 percent
  • Support tickets about delayed rankings dropped by 64 percent compared with the prior season
Key Takeaway for Glossary Readers

Cosmos DB throughput planning works best when teams design capacity, partitioning, and post-event scale-down together.

Case study 02

Smart meter backfill window

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A utilities analytics team needed to backfill two years of smart-meter readings into Cosmos DB while keeping the live outage dashboard responsive.

Business/Technical Objectives
  • Complete historical ingestion within a forty-eight-hour maintenance window
  • Protect live outage lookups from bulk import throttling
  • Avoid leaving elevated throughput active after the backfill
  • Capture RU evidence for the data governance review
Solution Using Cosmos DB throughput

The team separated live outage containers from historical ingestion containers so each had its own throughput setting. Before the backfill, operators used CLI show commands to record existing RU/s and then increased throughput only on the ingestion container. The import pipeline read SDK request charges and slowed batches when normalized RU consumption approached the alert threshold. Live dashboard containers remained on their normal capacity. A scheduled automation job restored the ingestion container to baseline after validation counts matched the source system. They also alerted on unexpected retry spikes during each import phase.

Results & Business Impact
  • The backfill finished in thirty-nine hours, nine hours ahead of the window
  • Live outage dashboard P95 latency stayed under 450 milliseconds during ingestion
  • Temporary RU/s was reduced automatically, avoiding an estimated 11,000 dollars in monthly run-rate waste
  • Governance reviewers received before-and-after throughput JSON and ingestion metrics
Key Takeaway for Glossary Readers

Throughput isolation lets teams run heavy data movement without turning a background migration into a customer-facing reliability incident.

Case study 03

Media catalog cost correction

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A streaming media platform stored catalog metadata in Cosmos DB and saw RU costs climb after editors added richer descriptions, tags, and localized fields.

Business/Technical Objectives
  • Reduce catalog query RU consumption without slowing editorial workflows
  • Find whether cost growth came from item size, indexing, or traffic
  • Keep browse-page P95 latency below 250 milliseconds
  • Create a capacity review model for future catalog expansions
Solution Using Cosmos DB throughput

Engineers collected SDK request-charge diagnostics and Azure Monitor throughput metrics, then compared high-RU queries with the indexing policy and item growth. The biggest cost driver was a broad query that scanned localized metadata across partitions for editor previews. The team changed the preview workflow to use targeted point reads where possible, adjusted indexing for rarely filtered fields, and kept provisioned throughput steady during validation. Azure CLI exports captured container throughput settings before and after the tuning work for the FinOps review.

Results & Business Impact
  • Average RU charge for editor preview queries dropped by 57 percent
  • Browse-page P95 latency improved from 310 milliseconds to 190 milliseconds
  • Monthly Cosmos DB run-rate fell by 23 percent without reducing customer capacity
  • Future catalog field additions now require request-charge testing before release
Key Takeaway for Glossary Readers

Cosmos DB throughput is not only a scale knob; it is feedback on whether the data model and query design are cost-effective.

Why use Azure CLI for this?

With ten years of Azure delivery work, I like Azure CLI for Cosmos DB throughput because capacity changes need precision and auditability. Portal clicks are fine for one container, but real estates have dozens of accounts, databases, APIs, and environments. CLI lets me show current RU/s, update a known container, migrate to autoscale where supported, export evidence, and schedule temporary changes during backfills. It also reduces confusion between account, database, and container scope. Throughput mistakes are expensive and user-visible, so I want commands that capture exact names, resource groups, locations, and before-and-after values. This is how avoidable burn stops.

CLI use cases

  • Show current manual or autoscale throughput for a Cosmos DB database or container before a launch review.
  • Raise container RU/s for a scheduled migration, then lower it through an approved automation job.
  • Migrate supported throughput settings between manual and autoscale when workload bursts justify the change.
  • Export account and container names with throughput settings for FinOps and capacity-governance reviews.

Before you run CLI

  • Confirm tenant, subscription, resource group, Cosmos DB account, API, database, and container scope before changing throughput.
  • Check whether the workload uses provisioned, autoscale, or serverless capacity and whether database-level shared throughput is involved.
  • Review cost impact, regional replication, minimum RU requirements, and any scheduled scale-down after temporary increases.
  • Use read-only show commands first, and capture current values in JSON before running mutating throughput updates.

What output tells you

  • Throughput output shows current RU/s or autoscale maximums and whether capacity is assigned at database or container scope.
  • Account and database identifiers confirm you are not scaling a similarly named development, test, or disaster-recovery resource.
  • Metric output reveals whether 429 responses, normalized RU consumption, or hot partitions justify a capacity change.
  • Command results provide before-and-after evidence for cost reviews, launch readiness, and post-incident analysis.

Mapped Azure CLI commands

Cosmos DB NoSQL throughput operations

direct
az cosmosdb sql database throughput show --account-name <account> --resource-group <resource-group> --name <database-name>
az cosmosdb sql database throughputdiscoverDatabases
az cosmosdb sql container throughput show --account-name <account> --resource-group <resource-group> --database-name <database-name> --name <container-name>
az cosmosdb sql container throughputdiscoverDatabases
az cosmosdb sql container throughput update --account-name <account> --resource-group <resource-group> --database-name <database-name> --name <container-name> --throughput <ru-per-second>
az cosmosdb sql container throughputconfigureDatabases
az cosmosdb sql container throughput migrate --account-name <account> --resource-group <resource-group> --database-name <database-name> --name <container-name> --throughput-type autoscale
az cosmosdb sql container throughputoperateDatabases
az monitor metrics list --resource <cosmos-container-resource-id> --metric NormalizedRUConsumption,TotalRequestUnits,TotalRequests
az monitor metricsdiscoverDatabases

Architecture context

In architecture, Cosmos DB throughput is an explicit capacity model that must be designed with the data model. I start with access patterns, partition key, item size, read/write mix, consistency, regional replication, and expected burst behavior before choosing manual RU/s, autoscale, or serverless. Container-level throughput gives isolation for important workloads; shared database throughput can save money for small containers with uneven traffic. Autoscale helps with bursts but has maximum RU planning implications. Multi-region writes and high consistency levels change capacity math. A strong design includes normalized RU monitoring, alerting on throttling, and planned temporary scale for imports or campaigns.

Security

Security impact is mostly indirect, but throughput can still expose and amplify risk. A poorly protected account can be abused with expensive queries or write floods that consume RUs, degrade legitimate traffic, and increase cost. Least-privilege roles, private endpoints, firewall rules, managed identity, key rotation, and diagnostic logs protect the account that owns the throughput. Operators should watch for unusual RU spikes by identity, region, container, or partition because they may indicate application bugs, scraping, or abuse. Throughput changes are also privileged cost and availability actions, so production scale updates should be controlled through RBAC and change records. Alert on those anomalies.

Cost

Throughput is one of the biggest Cosmos DB cost levers. Provisioned RU/s is billed whether fully used or mostly idle, and multi-region designs multiply capacity costs. Autoscale can reduce manual intervention but still requires a maximum that fits peak demand and budget. Serverless can be efficient for low or unpredictable usage, but heavy sustained traffic may cost more than provisioned throughput. Waste often hides in over-provisioned containers, poor queries, excessive indexing, hot partitions, and forgotten temporary scale-ups after migrations. FinOps reviews should pair RU metrics with business traffic so teams can distinguish necessary capacity from avoidable burn. Review exceptions monthly.

Reliability

Throughput strongly affects reliability because RU exhaustion appears as throttling. Cosmos DB SDKs can retry 429 responses, but a workload that constantly exceeds capacity will still miss latency targets and delay background processing. Reliable designs include enough RU headroom for normal peaks, autoscale for unpredictable bursts, partition keys that avoid hot partitions, and alerts on normalized RU consumption. Regional replication and failover also change demand patterns; after a failover, traffic may concentrate differently. Reliability is not simply raising RU/s. Teams must validate query efficiency, index policy, item size, and retry behavior so capacity is used predictably under stress. Test retry budgets under load.

Performance

Performance depends on whether each request can get enough RUs at the right partition at the right time. A container may have plenty of total RU/s and still suffer if one logical partition or query pattern is hot. Point reads with good partition keys are cheap and fast; cross-partition queries, large items, broad indexing, and high consistency can consume more RUs and increase latency. Autoscale helps absorb bursts, but it does not fix inefficient queries or a poor partition key. Operators should track latency, 429 rate, normalized RU consumption, query metrics, and SDK retry behavior together. Validate fixes with load tests.

Operations

Operators manage Cosmos DB throughput by inspecting current RU/s, autoscale maximums, serverless consumption, normalized RU metrics, throttled requests, and partition-level behavior. They scale throughput for known events, review cost and performance after releases, and compare actual RU charges from SDK responses against design assumptions. Runbooks should define who can raise or lower RU/s, how long temporary increases last, and what rollback looks like after imports or campaigns. Azure CLI is useful for showing and updating database or container throughput, migrating between manual and autoscale where supported, and capturing evidence during capacity reviews. Automation should log every approved change.

Common mistakes

  • Increasing total RU/s without checking hot partitions, which can leave throttling unchanged for the affected logical partition.
  • Forgetting to reduce temporary RU/s after a migration, turning a short backfill into a recurring cost leak.
  • Choosing shared database throughput for containers that need performance isolation during peak business events.
  • Blaming throughput when the real issue is a cross-partition query, oversized item, inefficient index policy, or missing partition key.