RU/s is the shorthand you see in Cosmos DB screens, metrics, pricing conversations, and CLI output for request units per second. It is the capacity rate that says how much database work can be served each second. Developers often notice it when requests are throttled, while finance teams notice it when provisioned capacity sits idle. The abbreviation is small, but it carries a big operational meaning: every query, read, write, and index update competes for this budget unless the workload is isolated or scaled differently.
RUs, request units per second, Cosmos DB RU/s, throughput RU/s, provisioned throughput RU/s
Difficulty
fundamentals
CLI mappings
5
Last verified
2026-05-22
Microsoft Learn
Microsoft Learn uses RU/s as the abbreviation for request units per second in Azure Cosmos DB. It describes provisioned throughput allocated to a database or container, including manual and autoscale modes, and is central to performance, throttling, and billing decisions.
In Azure architecture, RU/s is the operational label for Cosmos DB throughput at database or container level. Manual throughput sets a fixed RU/s value, while autoscale sets a maximum RU/s and adjusts within that range. Shared database RU/s is consumed by multiple containers; dedicated container RU/s isolates capacity. The value is shaped by partitioning, indexing, consistency, regions, and request patterns. Azure CLI, ARM, Bicep, Monitor metrics, and cost reports all surface RU/s, making it a cross-cutting term for data-plane performance and control-plane governance.
Why it matters
RU/s matters because it is the number everyone uses when Cosmos DB performance, throttling, and cost collide. An application team may say the API is slow, an SRE may see 429s, and FinOps may see an expensive account; RU/s connects all three conversations. It tells you whether capacity is manual, autoscale, shared, or dedicated, and whether the configured ceiling matches real traffic. Misreading RU/s leads to bad fixes: scaling the wrong container, ignoring hot partitions, or leaving temporary capacity in place. Clear RU/s ownership makes Cosmos DB easier to operate, especially across many teams and environments.
⌁
Where you see it
Signals, screens, and Azure surfaces where this term usually becomes operational.
Signal 01
In Cosmos DB portal Scale tabs, RU/s labels identify manual throughput, autoscale maximums, and whether capacity is assigned to a database or container before release approval.
Signal 02
In Azure Monitor and cost reviews, RU/s appears beside normalized consumption, request-unit charges, throttling, latency, and provisioned-throughput spending trends for the same account, workload, and time window.
Signal 03
In ARM, Bicep, and CLI outputs, RU/s settings show as throughput or autoscale properties that automation can compare for drift across subscriptions and deployment environments.
✦
When this becomes relevant
Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.
Inventory all Cosmos DB containers with manual RU/s to find idle capacity after migration or load testing.
Compare shared database RU/s with dedicated container RU/s before assigning ownership for noisy-neighbor incidents.
Set an autoscale maximum RU/s for unpredictable traffic while keeping a documented cost ceiling.
Prove a throttling incident came from capacity exhaustion rather than a regional service issue.
Create policy or review evidence that production RU/s changes include owner, expiry, and rollback values.
◆
Real-world case studies
Different enterprise-style examples that show the term being used to hit measurable objectives.
Case study 01
Airline loyalty program separates noisy campaign traffic
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
An airline loyalty platform ran campaign enrollment, balance reads, and partner lookups in Cosmos DB. A promotion caused enrollment writes to consume shared RU/s and slow balance checks at airport kiosks.
🎯Business/Technical Objectives
Protect kiosk balance reads during campaign spikes.
Identify which container consumed shared RU/s.
Cap promotion capacity without unlimited spend.
Create campaign-specific monitoring for future launches.
✅Solution Using RU/s
The platform team used CLI and Monitor data to confirm that balance and enrollment containers shared database RU/s. They moved campaign enrollment to a dedicated container throughput model with autoscale maximum RU/s and left partner reference data on shared capacity. Kiosk balance reads received their own dedicated manual RU/s because availability mattered more than saving a small amount of capacity. Dashboards split normalized consumption and throttling by container, and campaign runbooks included start, scale, and cleanup commands. The team also reviewed item size and indexing for enrollment writes to reduce request charge.
📈Results & Business Impact
Kiosk balance p95 latency improved from 1.9 seconds to 180 milliseconds during promotions.
Enrollment 429s dropped 87 percent without raising shared database throughput.
Promotion capacity spend stayed within the approved ceiling because autoscale maximums were documented.
RU/s scope matters; separating critical reads from bursty writes can fix reliability and cost at the same time.
Case study 02
Nonprofit grant portal chooses provisioned capacity over serverless
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
A nonprofit grant portal opened applications twice a year and stayed quiet the rest of the time. The team debated whether serverless consumption or provisioned RU/s fit the review workflow better.
🎯Business/Technical Objectives
Support two high-volume application windows per year.
Keep monthly database cost low outside grant season.
Avoid throttling during final-hour submissions.
Give finance a capacity model they could understand.
✅Solution Using RU/s
Engineers measured request units for draft saves, document metadata reads, and final submissions during a rehearsal. They compared serverless charges with a provisioned container using autoscale maximum RU/s during application windows and a lower manual setting afterward. Azure CLI exported the planned RU/s settings and the schedule for seasonal changes. The portal used alerts on normalized consumption and 429s so operators could raise the autoscale maximum if submissions exceeded the forecast. Finance received a simple model showing quiet-month capacity, application-window capacity, and cleanup commands.
Quiet-month database spend was reduced 34 percent compared with the previous always-on setting.
Finance approved the seasonal RU/s plan because each change had an owner and expiry date.
The support desk saw 46 percent fewer timeout tickets than the prior grant cycle.
💡Key Takeaway for Glossary Readers
RU/s planning is strongest when workload seasonality, cost ownership, and operational cleanup are designed together.
Case study 03
Cold-chain IoT service proves throttling came from a hot partition
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
A cold-chain IoT service tracked refrigerated shipments in Cosmos DB. Operators kept raising RU/s after throttling alerts, but a small set of high-volume routes still missed telemetry targets.
🎯Business/Technical Objectives
Identify whether more RU/s would solve the telemetry delays.
Reduce throttling for high-volume shipping routes.
Avoid doubling account spend without evidence.
Improve alerting for partition-specific pressure.
✅Solution Using RU/s
The SRE team compared account-level RU/s with normalized RU consumption and partition-level symptoms. CLI showed the container already had enough configured RU/s, while metrics revealed one partition range repeatedly hit saturation. The data team changed the partition-key strategy for new shipments to include route and time bucket, then migrated active high-volume shipments into a new container. RU/s remained nearly unchanged, but autoscale was retained for genuine bursts. Dashboards added alerts for hot partitions, 429s, and request charge by operation so future incidents would not be reduced to a bigger-throughput reflex.
📈Results & Business Impact
Telemetry delay for high-volume routes dropped from 14 minutes to under 90 seconds.
The team avoided a proposed 2x RU/s increase that would not have fixed the hot partition.
429 alerts fell 76 percent after repartitioning active shipments.
Incident reviews now require partition evidence before approving capacity increases.
💡Key Takeaway for Glossary Readers
RU/s is the budget, but partition design determines whether the workload can actually use that budget.
Why use Azure CLI for this?
I use Azure CLI for RU/s because the abbreviation shows up everywhere, but the portal can make scope easy to misread. A database-level RU/s value and a container-level RU/s value have very different ownership and noisy-neighbor behavior. CLI commands force me to name the account, database, and container, then show the exact throughput object. That is invaluable during audits, cost reviews, and incidents. I can export RU/s settings across subscriptions, compare autoscale maximums, identify forgotten manual throughput, and prove which value changed. For Cosmos DB, command evidence beats a screenshot because capacity scope is the whole story.
CLI use cases
List and show Cosmos DB resources to identify where RU/s is configured before a cost or reliability review.
Export manual and autoscale RU/s values across containers so teams can find drift and forgotten temporary increases.
Update a documented RU/s value during an incident, then verify metrics and restore the previous setting.
Before you run CLI
Confirm tenant, subscription, resource group, Cosmos account, API type, database, container, region, and output format.
Validate permissions, cost approval, and destructive risk; throughput changes are not deletes but can affect spend and availability.
az cosmosdb sql container throughputconfigureDatabases
az cosmosdb sql container list --account-name <account> --resource-group <resource-group> --database-name <database> -o table
az cosmosdb sql containerdiscoverDatabases
az monitor metrics list --resource <cosmos-account-resource-id> --metric TotalRequests,TotalRequestUnits,NormalizedRUConsumption
az monitor metricsdiscoverDatabases
Architecture context
Architecturally, RU/s is the capacity language shared by application developers, database engineers, platform teams, and finance. I map RU/s to workload criticality: checkout, ingestion, and control-plane containers deserve isolated or autoscale capacity; small lookup containers can share. I also map it to partition-key design because no RU/s setting can fully rescue a hot logical partition. In multi-region designs, RU/s planning must account for read distribution, failover, writes, and consistency choices. Governance should tag owners, document expected peak windows, and alert when actual normalized consumption stays far below or above the configured capacity for several review cycles.
Security
Security impact is indirect, but RU/s can become an availability and abuse-control concern. A compromised key, badly scoped token, or unbounded client can consume RU/s and throttle legitimate users. Network controls, RBAC or keys, private endpoints, and application rate limiting remain the primary security tools. Throughput updates also deserve access control because reducing RU/s can create denial-of-service symptoms, while increasing it can create cost exposure. Monitor abnormal RU/s consumption by operation, container, and region. During security investigations, preserve request metrics and key-rotation evidence so capacity exhaustion is not mistaken for normal seasonal demand or harmless growth.
Cost
Cost impact is direct because RU/s in provisioned Cosmos DB represents paid capacity, not just observed usage. Manual RU/s can waste money when traffic is low. Autoscale can reduce operational risk but must be capped intelligently. Shared RU/s can be economical for small containers, but expensive incidents happen when teams add more containers and forget they compete for the same capacity. Cost reports should be interpreted with throughput scope and business criticality in mind. The best cleanup habit is simple: every temporary RU/s increase needs an owner, expiry time, and verification command. Otherwise small emergency changes become permanent budget leaks.
Reliability
Reliability impact is direct because RU/s defines how much work Cosmos DB can accept before throttling. Well-designed clients handle 429 responses, but a workload that constantly exceeds RU/s will still miss latency objectives. Autoscale can absorb bursts, but only up to the configured maximum and only if partition distribution allows it. Shared RU/s can make one container unreliable because another container is noisy. Reliable designs pair RU/s settings with alerts, retry policies, partition analysis, and failover planning. Review RU/s after traffic changes, new regions, indexing changes, and feature launches. Do not wait for users to report throttling first.
Performance
Performance impact is direct because RU/s determines how many request units are available each second for reads, writes, queries, and index work. If requests consume more than the available rate, Cosmos DB returns throttling signals and the client waits or retries. Higher RU/s can improve throughput, but only if the workload can use it across partitions. Query tuning, indexing policy, item size, and consistency can reduce request charge without changing capacity. Watch p95 latency, 429 rate, request charge, and normalized consumption together. RU/s alone is not a performance guarantee; it is the budget that good design spends efficiently.
Operations
Operators manage RU/s by discovering throughput scope, checking manual versus autoscale settings, comparing values across environments, and correlating them with metrics. They look for normalized RU consumption near 100 percent, throttled requests, request charge outliers, and containers with forgotten temporary scale-ups. Azure CLI is useful for inventory and controlled updates, while Monitor workbooks show whether the setting is right. Runbooks should include approved minimums, autoscale maximums, cost owners, emergency scale values, and rollback commands. After any RU/s change, operators should verify that latency, 429s, and spend move in the expected direction within the approved observation window after release.
Common mistakes
Reading a database RU/s value and assuming every container has dedicated capacity when they actually share one pool.
Setting a high autoscale maximum without a cost owner, alert, or post-event review to confirm it is still needed.
Blaming RU/s alone for latency when request charge, indexing, consistency, item size, or hot partitions are the root cause.