Databases Azure Cosmos DB template-specs-upgraded

RU/s

RU/s is the shorthand you see in Cosmos DB screens, metrics, pricing conversations, and CLI output for request units per second. It is the capacity rate that says how much database work can be served each second. Developers often notice it when requests are throttled, while finance teams notice it when provisioned capacity sits idle. The abbreviation is small, but it carries a big operational meaning: every query, read, write, and index update competes for this budget unless the workload is isolated or scaled differently.

Back to glossary browser Open Microsoft Learn source

Aliases: RUs, request units per second, Cosmos DB RU/s, throughput RU/s, provisioned throughput RU/s
Difficulty: fundamentals
CLI mappings: 5
Last verified: 2026-05-22

Microsoft Learn

Microsoft Learn uses RU/s as the abbreviation for request units per second in Azure Cosmos DB. It describes provisioned throughput allocated to a database or container, including manual and autoscale modes, and is central to performance, throttling, and billing decisions.

Microsoft Learn: Provision throughput for containers and databases in Azure Cosmos DB2026-05-22

Technical context

In Azure architecture, RU/s is the operational label for Cosmos DB throughput at database or container level. Manual throughput sets a fixed RU/s value, while autoscale sets a maximum RU/s and adjusts within that range. Shared database RU/s is consumed by multiple containers; dedicated container RU/s isolates capacity. The value is shaped by partitioning, indexing, consistency, regions, and request patterns. Azure CLI, ARM, Bicep, Monitor metrics, and cost reports all surface RU/s, making it a cross-cutting term for data-plane performance and control-plane governance.

Why it matters

RU/s matters because it is the number everyone uses when Cosmos DB performance, throttling, and cost collide. An application team may say the API is slow, an SRE may see 429s, and FinOps may see an expensive account; RU/s connects all three conversations. It tells you whether capacity is manual, autoscale, shared, or dedicated, and whether the configured ceiling matches real traffic. Misreading RU/s leads to bad fixes: scaling the wrong container, ignoring hot partitions, or leaving temporary capacity in place. Clear RU/s ownership makes Cosmos DB easier to operate, especially across many teams and environments.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Cosmos DB portal Scale tabs, RU/s labels identify manual throughput, autoscale maximums, and whether capacity is assigned to a database or container before release approval.

Signal 02

In Azure Monitor and cost reviews, RU/s appears beside normalized consumption, request-unit charges, throttling, latency, and provisioned-throughput spending trends for the same account, workload, and time window.

Signal 03

In ARM, Bicep, and CLI outputs, RU/s settings show as throughput or autoscale properties that automation can compare for drift across subscriptions and deployment environments.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Inventory all Cosmos DB containers with manual RU/s to find idle capacity after migration or load testing.
Compare shared database RU/s with dedicated container RU/s before assigning ownership for noisy-neighbor incidents.
Set an autoscale maximum RU/s for unpredictable traffic while keeping a documented cost ceiling.
Prove a throttling incident came from capacity exhaustion rather than a regional service issue.
Create policy or review evidence that production RU/s changes include owner, expiry, and rollback values.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Airline loyalty program separates noisy campaign traffic

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An airline loyalty platform ran campaign enrollment, balance reads, and partner lookups in Cosmos DB. A promotion caused enrollment writes to consume shared RU/s and slow balance checks at airport kiosks.

Business/Technical Objectives

Protect kiosk balance reads during campaign spikes.
Identify which container consumed shared RU/s.
Cap promotion capacity without unlimited spend.
Create campaign-specific monitoring for future launches.

Solution Using RU/s

The platform team used CLI and Monitor data to confirm that balance and enrollment containers shared database RU/s. They moved campaign enrollment to a dedicated container throughput model with autoscale maximum RU/s and left partner reference data on shared capacity. Kiosk balance reads received their own dedicated manual RU/s because availability mattered more than saving a small amount of capacity. Dashboards split normalized consumption and throttling by container, and campaign runbooks included start, scale, and cleanup commands. The team also reviewed item size and indexing for enrollment writes to reduce request charge.

Results & Business Impact

Kiosk balance p95 latency improved from 1.9 seconds to 180 milliseconds during promotions.
Enrollment 429s dropped 87 percent without raising shared database throughput.
Promotion capacity spend stayed within the approved ceiling because autoscale maximums were documented.
Future campaign reviews gained per-container RU/s evidence instead of account-level guesswork.

Key Takeaway for Glossary Readers

RU/s scope matters; separating critical reads from bursty writes can fix reliability and cost at the same time.

Case study 02

Nonprofit grant portal chooses provisioned capacity over serverless

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A nonprofit grant portal opened applications twice a year and stayed quiet the rest of the time. The team debated whether serverless consumption or provisioned RU/s fit the review workflow better.

Business/Technical Objectives

Support two high-volume application windows per year.
Keep monthly database cost low outside grant season.
Avoid throttling during final-hour submissions.
Give finance a capacity model they could understand.

Solution Using RU/s

Engineers measured request units for draft saves, document metadata reads, and final submissions during a rehearsal. They compared serverless charges with a provisioned container using autoscale maximum RU/s during application windows and a lower manual setting afterward. Azure CLI exported the planned RU/s settings and the schedule for seasonal changes. The portal used alerts on normalized consumption and 429s so operators could raise the autoscale maximum if submissions exceeded the forecast. Finance received a simple model showing quiet-month capacity, application-window capacity, and cleanup commands.

Results & Business Impact

Final-hour submission throttling stayed below 0.2 percent.
Quiet-month database spend was reduced 34 percent compared with the previous always-on setting.
Finance approved the seasonal RU/s plan because each change had an owner and expiry date.
The support desk saw 46 percent fewer timeout tickets than the prior grant cycle.

Key Takeaway for Glossary Readers

RU/s planning is strongest when workload seasonality, cost ownership, and operational cleanup are designed together.

Case study 03

Cold-chain IoT service proves throttling came from a hot partition

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A cold-chain IoT service tracked refrigerated shipments in Cosmos DB. Operators kept raising RU/s after throttling alerts, but a small set of high-volume routes still missed telemetry targets.

Business/Technical Objectives

Identify whether more RU/s would solve the telemetry delays.
Reduce throttling for high-volume shipping routes.
Avoid doubling account spend without evidence.
Improve alerting for partition-specific pressure.

Solution Using RU/s

The SRE team compared account-level RU/s with normalized RU consumption and partition-level symptoms. CLI showed the container already had enough configured RU/s, while metrics revealed one partition range repeatedly hit saturation. The data team changed the partition-key strategy for new shipments to include route and time bucket, then migrated active high-volume shipments into a new container. RU/s remained nearly unchanged, but autoscale was retained for genuine bursts. Dashboards added alerts for hot partitions, 429s, and request charge by operation so future incidents would not be reduced to a bigger-throughput reflex.

Results & Business Impact

Telemetry delay for high-volume routes dropped from 14 minutes to under 90 seconds.
The team avoided a proposed 2x RU/s increase that would not have fixed the hot partition.
429 alerts fell 76 percent after repartitioning active shipments.
Incident reviews now require partition evidence before approving capacity increases.

Key Takeaway for Glossary Readers

RU/s is the budget, but partition design determines whether the workload can actually use that budget.

Why use Azure CLI for this?

I use Azure CLI for RU/s because the abbreviation shows up everywhere, but the portal can make scope easy to misread. A database-level RU/s value and a container-level RU/s value have very different ownership and noisy-neighbor behavior. CLI commands force me to name the account, database, and container, then show the exact throughput object. That is invaluable during audits, cost reviews, and incidents. I can export RU/s settings across subscriptions, compare autoscale maximums, identify forgotten manual throughput, and prove which value changed. For Cosmos DB, command evidence beats a screenshot because capacity scope is the whole story.

CLI use cases

List and show Cosmos DB resources to identify where RU/s is configured before a cost or reliability review.
Export manual and autoscale RU/s values across containers so teams can find drift and forgotten temporary increases.
Update a documented RU/s value during an incident, then verify metrics and restore the previous setting.

Before you run CLI

Confirm tenant, subscription, resource group, Cosmos account, API type, database, container, region, and output format.
Validate permissions, cost approval, and destructive risk; throughput changes are not deletes but can affect spend and availability.
Check autoscale mode, shared throughput, partition health, private networking, identity, provider registration, and rollback values.

What output tells you

Throughput output tells whether RU/s is fixed manual capacity or an autoscale maximum for the selected database or container.
Resource IDs and names show the exact scope, preventing teams from scaling a database when the hot workload uses container throughput.
Metric output explains whether RU/s is saturated, idle, or unevenly consumed across partitions, regions, and operation types.

Mapped Azure CLI commands

RU/s Azure CLI commands

operational

az cosmosdb sql container throughput show --account-name <account> --resource-group <resource-group> --database-name <database> --name <container>

az cosmosdb sql container throughputdiscoverDatabases

az cosmosdb sql database throughput show --account-name <account> --resource-group <resource-group> --name <database>

az cosmosdb sql database throughputdiscoverDatabases

az cosmosdb sql container throughput update --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --max-throughput <ru>

az cosmosdb sql container throughputconfigureDatabases

az cosmosdb sql container list --account-name <account> --resource-group <resource-group> --database-name <database> -o table

az cosmosdb sql containerdiscoverDatabases

az monitor metrics list --resource <cosmos-account-resource-id> --metric TotalRequests,TotalRequestUnits,NormalizedRUConsumption

az monitor metricsdiscoverDatabases

Architecture context

Architecturally, RU/s is the capacity language shared by application developers, database engineers, platform teams, and finance. I map RU/s to workload criticality: checkout, ingestion, and control-plane containers deserve isolated or autoscale capacity; small lookup containers can share. I also map it to partition-key design because no RU/s setting can fully rescue a hot logical partition. In multi-region designs, RU/s planning must account for read distribution, failover, writes, and consistency choices. Governance should tag owners, document expected peak windows, and alert when actual normalized consumption stays far below or above the configured capacity for several review cycles.

Security

Security impact is indirect, but RU/s can become an availability and abuse-control concern. A compromised key, badly scoped token, or unbounded client can consume RU/s and throttle legitimate users. Network controls, RBAC or keys, private endpoints, and application rate limiting remain the primary security tools. Throughput updates also deserve access control because reducing RU/s can create denial-of-service symptoms, while increasing it can create cost exposure. Monitor abnormal RU/s consumption by operation, container, and region. During security investigations, preserve request metrics and key-rotation evidence so capacity exhaustion is not mistaken for normal seasonal demand or harmless growth.

Cost

Cost impact is direct because RU/s in provisioned Cosmos DB represents paid capacity, not just observed usage. Manual RU/s can waste money when traffic is low. Autoscale can reduce operational risk but must be capped intelligently. Shared RU/s can be economical for small containers, but expensive incidents happen when teams add more containers and forget they compete for the same capacity. Cost reports should be interpreted with throughput scope and business criticality in mind. The best cleanup habit is simple: every temporary RU/s increase needs an owner, expiry time, and verification command. Otherwise small emergency changes become permanent budget leaks.

Reliability

Reliability impact is direct because RU/s defines how much work Cosmos DB can accept before throttling. Well-designed clients handle 429 responses, but a workload that constantly exceeds RU/s will still miss latency objectives. Autoscale can absorb bursts, but only up to the configured maximum and only if partition distribution allows it. Shared RU/s can make one container unreliable because another container is noisy. Reliable designs pair RU/s settings with alerts, retry policies, partition analysis, and failover planning. Review RU/s after traffic changes, new regions, indexing changes, and feature launches. Do not wait for users to report throttling first.

Performance

Performance impact is direct because RU/s determines how many request units are available each second for reads, writes, queries, and index work. If requests consume more than the available rate, Cosmos DB returns throttling signals and the client waits or retries. Higher RU/s can improve throughput, but only if the workload can use it across partitions. Query tuning, indexing policy, item size, and consistency can reduce request charge without changing capacity. Watch p95 latency, 429 rate, request charge, and normalized consumption together. RU/s alone is not a performance guarantee; it is the budget that good design spends efficiently.

Operations

Operators manage RU/s by discovering throughput scope, checking manual versus autoscale settings, comparing values across environments, and correlating them with metrics. They look for normalized RU consumption near 100 percent, throttled requests, request charge outliers, and containers with forgotten temporary scale-ups. Azure CLI is useful for inventory and controlled updates, while Monitor workbooks show whether the setting is right. Runbooks should include approved minimums, autoscale maximums, cost owners, emergency scale values, and rollback commands. After any RU/s change, operators should verify that latency, 429s, and spend move in the expected direction within the approved observation window after release.

Common mistakes

Reading a database RU/s value and assuming every container has dedicated capacity when they actually share one pool.
Setting a high autoscale maximum without a cost owner, alert, or post-event review to confirm it is still needed.
Blaming RU/s alone for latency when request charge, indexing, consistency, item size, or hot partitions are the root cause.

Operator quick checks

Show the throughput object at both database and container scope before deciding which RU/s value matters.
Compare normalized RU consumption, 429s, latency, and request charge for the same time window.
Verify the previous RU/s value, approved new value, cost owner, and rollback command before any update.

Questions to ask

Is this RU/s value shared by multiple containers or dedicated to the workload under review?
What happens to users, cost, and retries if this value is halved or doubled?
Which metric proves the configured RU/s matches the real demand pattern rather than an old emergency setting?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph