Databases Azure Cosmos DB template-specs-upgraded

RU per second

RU per second is the throughput budget Azure Cosmos DB gives a database or container every second. Each read, write, query, or stored procedure consumes request units based on how much CPU, memory, I/O, and indexing work it needs. Provisioning more RU per second usually means the workload can handle more operations before throttling, but it also raises cost. The term is useful because it links application behavior, query design, partitioning, and billing into one capacity number that engineers can monitor and change.

Aliases
request units per second, RUs per second, Cosmos throughput rate, provisioned RU rate, Cosmos DB RU per second
Difficulty
intermediate
CLI mappings
5
Last verified
2026-05-22

Microsoft Learn

Microsoft Learn explains that Azure Cosmos DB provisioned throughput is assigned on a per-second basis in request units. RU per second represents the throughput reserved for a database or container, can be scaled, and is billed hourly in provisioned mode.

Microsoft Learn: Request Units in Azure Cosmos DB2026-05-22

Technical context

In Azure architecture, RU per second sits in the Cosmos DB data-plane capacity model and is configured at database or container scope, depending on the throughput mode. It applies across logical and physical partitions, interacts with partition-key design, indexing policy, consistency level, multi-region writes, and autoscale settings. Control-plane tools such as Azure CLI and ARM update throughput settings, while metrics show request charge, normalized consumption, throttled requests, and hot partition symptoms. RU per second is not a VM size; it is a provisioned service capacity contract for database operations.

Why it matters

RU per second matters because Cosmos DB performance and cost are inseparable from request-unit capacity. Too little throughput causes 429 throttling, retry storms, long page loads, and failed ingestion jobs. Too much throughput wastes money every hour, especially in environments that only peak briefly. RU per second also exposes bad design: an inefficient query, overly broad partition, or heavy indexing policy can consume capacity faster than expected. Teams should size RU per second from measured request charges and traffic patterns, not guesswork. The best operators watch normalized RU consumption, throttling, partition distribution, and application retries together before changing the number.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Cosmos DB Scale settings, RU per second appears as manual throughput or autoscale maximum for a database or container during capacity planning reviews.

Signal 02

In Azure CLI throughput output, you see offer throughput, autoscale settings, resource IDs, and whether capacity belongs to a database or container before a scaling command runs.

Signal 03

In Azure Monitor metrics, RU pressure appears through normalized consumption, total request units, throttled requests, latency, and partition-level imbalance indicators in dashboards used during production incidents.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Raise container throughput during a planned product launch where measured request charges predict 429 throttling.
  • Lower manual throughput after a migration rehearsal to stop paying for capacity that was only needed temporarily.
  • Decide whether a noisy container needs dedicated RU per second instead of sharing database throughput.
  • Tune indexing and query shape before increasing RU per second for an expensive cross-partition workload.
  • Set autoscale maximum RU per second for a traffic pattern that spikes unpredictably but must meet latency SLOs.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Gaming leaderboard survives launch weekend without runaway spend

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A multiplayer game launched a global leaderboard backed by Cosmos DB. Load tests showed 429 throttling during match-end bursts, but leaving peak capacity on all week would have exceeded the launch budget.

Business/Technical Objectives
  • Keep leaderboard write latency below 80 milliseconds at match end.
  • Avoid sustained overprovisioning after launch weekend.
  • Separate leaderboard capacity from lower-priority profile reads.
  • Give operations a clear rollback and cleanup process.
Solution Using RU per second

The team measured request charges for score writes, rank reads, and seasonal queries, then moved the leaderboard container to dedicated RU per second. Azure CLI captured baseline throughput, raised manual throughput for launch windows, and scheduled a post-event review to lower capacity. Metrics for normalized RU consumption, 429s, and latency were pinned to the release dashboard. Profile and cosmetic data remained in shared throughput containers so leaderboard spikes could not starve unrelated reads. The client retry policy was tuned to respect Cosmos DB retry-after headers instead of hammering the service.

Results & Business Impact
  • Leaderboard write p95 latency stayed at 54 milliseconds during peak match completions.
  • 429 responses fell from 8.6 percent in load test to 0.4 percent in production.
  • Post-launch cleanup reduced provisioned capacity by 63 percent within 48 hours.
  • No profile-read incidents were traced to leaderboard traffic during the first tournament.
Key Takeaway for Glossary Readers

RU per second lets teams buy predictable Cosmos DB capacity for a critical container, then deliberately release it when the peak is over.

Case study 02

Energy analytics platform right-sizes smart-meter ingestion

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An energy analytics platform ingested smart-meter readings every fifteen minutes. The team saw throttling at the top of each interval and idle capacity for most of the hour.

Business/Technical Objectives
  • Reduce ingestion throttling during interval boundaries.
  • Stop paying for peak manual throughput all day.
  • Keep downstream analytics queries under their latency target.
  • Identify whether shared throughput caused container contention.
Solution Using RU per second

Engineers reviewed request charges for meter writes, aggregation reads, and anomaly queries. They discovered that ingestion and analytics containers shared database throughput, so interval writes starved query workloads. The platform split ingestion into a dedicated container with autoscale maximum RU per second and left stable reference data on shared throughput after validating partition distribution. Azure CLI documented the old shared setting, the new dedicated setting, and the autoscale maximum. Azure Monitor alerts were adjusted to watch normalized RU consumption and throttled requests separately for the ingestion container.

Results & Business Impact
  • Ingestion 429s during interval boundaries dropped by 92 percent.
  • Average hourly RU spend fell 28 percent compared with the previous manual peak setting.
  • Analytics query p95 stayed below 1.7 seconds during meter-write bursts.
  • The team could explain cost changes per container instead of blaming the whole database.
Key Takeaway for Glossary Readers

RU per second design is often a workload-isolation decision, not just a bigger-or-smaller capacity number.

Case study 03

Legal discovery import scales temporarily and cleans up on time

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A legal discovery team imported millions of case documents into Cosmos DB before a filing deadline. Their first import rehearsal ran overnight, triggered throttling, and delayed attorney review.

Business/Technical Objectives
  • Finish the production import inside a four-hour window.
  • Avoid changing throughput permanently for a one-time event.
  • Keep review-portal reads available during import.
  • Capture evidence for cost and compliance review.
Solution Using RU per second

The data team calculated average request charge for document creates and index updates, then estimated the RU per second needed for the import window. They used Azure CLI to record current container throughput, raise throughput before the job, and restore the previous value after validation. The review portal used a separate container so attorney reads were isolated from bulk writes. Monitor workbooks tracked total request units, 429s, and latency throughout the import. The change ticket included the exact commands, approved cost window, and rollback value.

Results & Business Impact
  • The production import completed in 3.4 hours instead of the projected 11 hours.
  • Attorney portal p95 read latency stayed under 220 milliseconds during the import.
  • Temporary capacity was reduced 30 minutes after validation, avoiding three days of extra spend.
  • The compliance team received command output and metric evidence in the same change record.
Key Takeaway for Glossary Readers

RU per second can be treated as temporary event capacity when teams script both the scale-up and the scale-down.

Why use Azure CLI for this?

With Cosmos DB, I use Azure CLI for RU per second because capacity changes should be repeatable, reviewable, and easy to reverse. The portal is fine for a single adjustment, but incidents and release windows need commands that show current throughput, update a database or container, and capture evidence. CLI also helps compare environments where one container is manual, another is autoscale, and a shared database hides contention between containers. After ten years of Azure work, I do not trust memory during scaling events. I want the account, database, container, throughput mode, region, and output recorded in a change log.

CLI use cases

  • Show current database or container throughput before approving a manual or autoscale capacity change.
  • Update RU per second during a controlled release window and export the exact command for change records.
  • Compare throughput settings across dev, test, and production to identify drift or forgotten temporary scale-ups.

Before you run CLI

  • Confirm tenant, subscription, resource group, Cosmos account, database, container, region, throughput mode, and output format.
  • Check permissions and cost approval because throughput updates can change hourly spend and affect production throttling immediately.
  • Review partition key, autoscale limits, shared versus dedicated throughput, multi-region writes, metrics, and rollback values first.

What output tells you

  • Throughput output shows the current RU per second or autoscale maximum, revealing the capacity contract for that database or container.
  • Container and database output tells whether throughput is dedicated or shared, which changes isolation, cost, and noisy-neighbor risk.
  • Metric output shows whether capacity is actually saturated through normalized consumption, request charge, throttling, latency, and retry symptoms.

Mapped Azure CLI commands

RU per second Azure CLI commands

operational
az cosmosdb sql database throughput show --account-name <account> --resource-group <resource-group> --name <database>
az cosmosdb sql database throughputdiscoverDatabases
az cosmosdb sql container throughput show --account-name <account> --resource-group <resource-group> --database-name <database> --name <container>
az cosmosdb sql container throughputdiscoverDatabases
az cosmosdb sql container throughput update --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --throughput <ru>
az cosmosdb sql container throughputconfigureDatabases
az cosmosdb sql database throughput update --account-name <account> --resource-group <resource-group> --name <database> --throughput <ru>
az cosmosdb sql database throughputconfigureDatabases
az monitor metrics list --resource <cosmos-account-resource-id> --metric TotalRequestUnits,NormalizedRUConsumption
az monitor metricsdiscoverDatabases

Architecture context

Architecturally, RU per second is where database design meets SRE and FinOps. A container with dedicated throughput protects a critical workload from noisy neighbors but costs more than shared database throughput. Autoscale handles variable traffic but requires a sensible maximum RU per second and partition design. Multi-region writes and stronger consistency can change request costs, so the number cannot be copied blindly between architectures. I design RU capacity with partition-key strategy, indexing policy, traffic forecasts, retry policy, and monitoring workbooks. The goal is not the lowest number; it is enough headroom for peak traffic without paying for idle capacity all day.

Security

Security impact is indirect. RU per second does not authenticate callers or encrypt data, but it affects availability under abuse, bad code, or unexpected traffic. An attacker, buggy client, or runaway query can burn capacity and throttle legitimate users. Strong identity, keys or Microsoft Entra authentication where applicable, network restrictions, and rate controls still matter. Operators should restrict who can update throughput because a malicious or careless change can raise cost or reduce service capacity. Monitor unusual request-unit consumption, spikes by operation type, repeated 429s, and suspicious client identity patterns. Treat RU changes as production-capacity changes with approval, audit, and rollback evidence.

Cost

Cost impact is direct in provisioned mode because you pay for the RU per second configured, commonly billed hourly, whether the application uses all of it or not. Shared throughput can reduce cost for small containers, while dedicated throughput isolates important workloads. Autoscale can cost more at high configured maximums but saves operational effort for variable demand. Over-indexed containers, cross-partition queries, and inefficient reads raise RU consumption and push teams toward higher settings. FinOps reviews should compare provisioned RU per second, observed normalized consumption, throttling, peak windows, and business criticality. Temporary scale-ups need expiry dates, owners, and post-event cleanup with named owners.

Reliability

Reliability impact is direct because insufficient RU per second leads to throttling rather than graceful infinite scaling. Cosmos DB clients can retry 429 responses, but retries consume time and can amplify pressure if the application is not tuned. Reliable designs use measured request charges, autoscale where traffic is variable, partition keys that spread load, and alerting on normalized RU consumption before users complain. Regional failover and multi-region writes also need capacity planning because traffic can shift suddenly. Do not treat RU per second as a static setup value. Review it after launches, imports, new queries, indexing changes, and traffic seasonality.

Performance

Performance impact is direct because RU per second is the capacity pool consumed by database operations. When requests fit within the budget and partitions are balanced, Cosmos DB can deliver predictable low latency. When a query or hot partition exceeds the budget, users see throttling, retries, and slower responses. More RU per second can help, but only if the workload is not constrained by a single hot logical partition or inefficient query pattern. Measure request charge per operation, normalized RU consumption, 429 rate, latency, and partition distribution. Scaling throughput without fixing expensive queries may only move the bottleneck and increase spend.

Operations

Operators inspect RU per second by showing database or container throughput, checking whether throughput is manual, autoscale, shared, or dedicated, and correlating that setting with metrics. They review Total Request Units, normalized RU consumption, throttled requests, partition key ranges, and application retry logs. Azure CLI supports inventory, scaling, evidence export, and drift checks across subscriptions. Runbooks should state who may raise throughput, how long a temporary increase lasts, and which dashboards prove recovery. After a change, operators should verify that 429s decline, latency improves, and cost expectations still match the approved window, documented owner, cleanup ticket, alert status, and dashboard ownership.

Common mistakes

  • Raising RU per second to hide an inefficient query without measuring request charge or partition distribution first.
  • Forgetting to lower a temporary scale-up after imports, load tests, launches, or migration rehearsals are finished.
  • Confusing shared database throughput with dedicated container throughput and blaming the wrong workload for throttling.