Databases Cosmos DB verified

Request Unit

A request unit is the way Azure Cosmos DB prices and measures the work behind a database operation. A small point read costs fewer RUs than a complex query, a large write, or a cross-partition scan. Provisioned throughput reserves a number of RUs per second, while serverless charges by consumed RUs. Developers see the RU charge in SDK responses and metrics. Operators use it to decide whether poor performance is caused by low throughput, bad partitioning, expensive queries, or inefficient indexing.

Aliases
RU, RU/s, Cosmos DB request unit, request unit charge, Cosmos throughput currency
Difficulty
fundamentals
CLI mappings
6
Last verified
2026-05-22T04:15:00Z

Microsoft Learn

In Azure Cosmos DB, a request unit is the normalized measure of resource cost for database operations. Reads, writes, and queries consume RUs, and provisioned throughput is expressed as RU/s so teams can plan performance and cost consistently across APIs and workloads. across APIs.

Microsoft Learn: Request units in Azure Cosmos DB2026-05-22T04:15:00Z

Technical context

In Azure architecture, request units sit in the Cosmos DB data plane and capacity model. Every operation against a container consumes RUs based on CPU, memory, and I/O work abstracted by the service. Throughput can be provisioned at container or database scope, set manually, or configured for autoscale. RU behavior intersects with partition keys, indexing policy, consistency level, item size, query shape, regions, and monitoring. Azure CLI manages the surrounding throughput settings, while application code and metrics reveal the actual request charge and normalized utilization.

Why it matters

Request units matter because they connect performance and cost in one operational language. When an app gets 429 throttling, slow responses, or a bill spike, the answer is rarely just add more capacity. Teams need to know whether the workload is doing point reads, fan-out queries, oversized writes, inefficient indexing, or hot-partition access. RU visibility lets developers tune queries and lets operators size throughput without guessing. It also makes architecture tradeoffs concrete: serverless versus provisioned, manual versus autoscale, container versus database throughput, and whether a partition-key redesign is cheaper than paying for more RU/s forever. daily. Every single day. That shared unit keeps tuning conversations grounded in evidence. Those tradeoffs shape every scaling conversation.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Cosmos DB SDK responses and diagnostics, request charge shows how many RUs a read, write, query, stored procedure, or transactional batch consumed for each call.

Signal 02

In Azure Monitor metrics, normalized RU consumption, throttled requests, and total request units reveal capacity pressure by account, database, container, physical partition, or partition range.

Signal 03

In Azure CLI throughput output, manual throughput, autoscale maximum throughput, database scope, container scope, and offer details show how RU/s capacity is configured today.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Diagnose 429 throttling by comparing normalized RU consumption with partition-key distribution and expensive query patterns.
  • Choose manual throughput, autoscale, or serverless after modeling traffic shape and acceptable cost risk.
  • Reduce RU cost by replacing cross-partition scans with point reads, better filters, or a more suitable partition key.
  • Right-size container throughput before a product launch instead of guessing from average development traffic.
  • Export RU settings across accounts so FinOps can find idle high-throughput containers and unsafe underprovisioning.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Streaming platform tunes personalization without buying blind capacity

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

StreamForge used Cosmos DB for viewer profile and recommendation state. During new-release nights, users saw slow personalization because recommendation queries consumed far more RUs than expected.

Business/Technical Objectives
  • Reduce profile lookup latency during premiere traffic.
  • Lower avoidable RU consumption from fan-out recommendation queries.
  • Avoid permanently raising autoscale maximum for one weekly traffic window.
  • Give developers per-operation RU evidence for tuning work.
Solution Using Request Unit

The SRE team collected SDK request-charge diagnostics and Azure Monitor normalized RU consumption by container. They found that a recommendation query crossed partitions because the partition key was not included in the filter. Developers changed the access pattern to point reads for profile state and a narrower query for recommendation candidates. Azure CLI showed the container autoscale maximum and temporarily raised it for two release nights while code changes rolled out. After deployment, operators compared TotalRequestUnits, throttled requests, and p95 latency before returning autoscale limits to the approved value. They also compared portal charts with SDK-level charges for the same requests.

Results & Business Impact
  • P95 personalization latency dropped from 820 milliseconds to 230 milliseconds.
  • Average RU charge for the main recommendation path fell by 63 percent.
  • Throttled requests during premiere windows dropped from 7,400 to fewer than 300.
  • The temporary autoscale increase was removed after tuning, avoiding an estimated 18 percent monthly cost increase.
Key Takeaway for Glossary Readers

Request units show whether a Cosmos DB problem needs more capacity, better queries, or a different access pattern.

Case study 02

City records portal controls serverless RU spikes

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Brighton Harbor digitized permit records in a public portal backed by Cosmos DB serverless. A new search page caused sudden RU spikes whenever residents filtered by address fragments.

Business/Technical Objectives
  • Keep public permit searches responsive during tax-season traffic.
  • Reduce serverless RU charges from inefficient filters.
  • Avoid forcing residents through a slower batch reporting process.
  • Create a cost and performance review for new query features.
Solution Using Request Unit

Engineers reviewed request-charge telemetry and found that partial address filters caused expensive scans over large JSON documents. They redesigned the search page to use a dedicated normalized address field, adjusted indexing policy for the queried paths, and pushed deep historical searches to an asynchronous report workflow. Azure CLI exported container metadata and monitoring queries tracked TotalRequestUnits before and after release. The city added a rule that new search features must include request-charge measurements from test data before production approval. The team rehearsed rollback to fixed throughput before changing the production container.

Results & Business Impact
  • Serverless RU charges for permit search dropped by 41 percent in the first billing cycle.
  • Median search response time improved from 1.9 seconds to 640 milliseconds.
  • The portal handled 3.2 times more resident searches without moving to provisioned throughput.
  • New feature reviews began including RU evidence instead of only page-load screenshots.
Key Takeaway for Glossary Readers

Request units turn public-service database tuning into measurable work instead of guesswork after the bill arrives.

Case study 03

Biotech lab prevents hot-partition throttling in experiment tracking

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HelioGene Labs stored experiment events in Cosmos DB for equipment telemetry and sample processing. One high-volume sequencer generated most writes under the same partition key, causing throttling during overnight runs.

Business/Technical Objectives
  • Eliminate write throttling for high-throughput sequencer events.
  • Preserve near-real-time experiment dashboards for lab technicians.
  • Avoid raising RU/s across the whole container unnecessarily.
  • Document retry and partition behavior for regulated experiment records.
Solution Using Request Unit

The application and operations teams compared throttled request metrics with partition-key values and found a hot partition tied to device ID. They introduced a compound partition strategy for new experiment events, using device and time bucket values while keeping reads predictable for dashboards. Azure CLI confirmed container throughput settings and captured the manual RU/s value before a short-term scale increase. SDK diagnostics proved that request charges were reasonable but concentrated. After the schema rollout, dashboard queries were updated to use the new partition pattern, and retry settings were reviewed for safe ingestion during bursts. They validated the changes with production-like data, coordinator filters, the planned regional write pattern, and a rollout rehearsal before enabling the new region.

Results & Business Impact
  • Overnight 429 throttling fell by 96 percent after the new partition pattern went live.
  • Technician dashboard freshness improved from 14 minutes behind to under two minutes.
  • The temporary RU/s increase was rolled back because partition balance solved the real bottleneck.
  • Compliance documentation gained a clear explanation of retry behavior and experiment-event durability.
Key Takeaway for Glossary Readers

Request units must be read with partition behavior; otherwise teams buy more capacity while one hot key keeps failing.

Why use Azure CLI for this?

After ten years of Azure engineering work, I use Azure CLI for request-unit work because capacity tuning needs evidence across accounts, containers, and regions. The portal is helpful for one chart, but CLI can export throughput settings, metrics, throttling counts, and indexing policy details repeatedly. That matters when comparing dev, test, and production or proving that a hot partition, not total RU/s, caused throttling. CLI also fits change control: inspect current throughput, apply autoscale or manual settings, capture before-and-after metrics, and keep evidence after a throttling incident or cost review. It turns RU debates into measured capacity decisions.

CLI use cases

  • Show a container throughput setting and identify whether capacity is manual or autoscale.
  • Update manual RU/s during a planned launch window with a documented rollback value.
  • Migrate a bursty workload from manual throughput to autoscale after traffic-pattern review.
  • List Cosmos DB containers and compare provisioned throughput against monitoring data for cost review.
  • Export database-level throughput settings to detect shared-throughput databases that hide noisy containers.

Before you run CLI

  • Confirm tenant, subscription, resource group, Cosmos DB account, API type, database name, container name, region, and throughput scope.
  • Check permissions to read and update throughput, and verify whether changes are allowed by policy, change window, and application SLA.
  • Understand cost risk before increasing RU/s or autoscale maximum, because provisioned throughput can affect hourly charges immediately.
  • Use JSON output for evidence, and coordinate with developers because RU problems may require code or partition-key changes, not only scaling.

What output tells you

  • Throughput output shows whether RU/s is defined at database or container level and whether autoscale maximum throughput is configured.
  • Provisioning state and resource IDs confirm which account, database, and container will receive any capacity change.
  • Metric output distinguishes high normalized RU consumption from normal usage with a few inefficient operations or hot partitions.
  • Autoscale and manual values help estimate cost exposure and decide whether emergency scaling should be temporary or permanent.

Mapped Azure CLI commands

Cosmos DB request-unit throughput operations

direct
az cosmosdb sql container throughput show --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --output json
az cosmosdb sql container throughputdiscoverDatabases
az cosmosdb sql container throughput update --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --throughput <ru-per-second>
az cosmosdb sql container throughputconfigureDatabases
az cosmosdb sql container throughput update --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --max-throughput <max-ru-per-second>
az cosmosdb sql container throughputconfigureDatabases
az cosmosdb sql container throughput migrate --resource-group <resource-group> --account-name <account> --database-name <database> --name <container> --throughput-type autoscale
az cosmosdb sql container throughputoperateDatabases
az cosmosdb sql database throughput show --resource-group <resource-group> --account-name <account> --name <database> --output json
az cosmosdb sql database throughputdiscoverDatabases
az monitor metrics list --resource <cosmos-account-resource-id> --metric NormalizedRUConsumption,TotalRequestUnits,TotalRequests --interval PT1M --aggregation Average Total
az monitor metricsdiscoverDatabases

Architecture context

A Cosmos DB architect treats request units as both a performance budget and a spending budget. The right design starts with access patterns: point reads, writes, queries, fan-out behavior, and partition-key cardinality. Then the team chooses serverless, manual throughput, autoscale, or shared database throughput. RU consumption should be modeled before migration and observed after launch through normalized RU consumption, throttled requests, and request-charge telemetry. Indexing policy, consistency level, region count, and item size all change the RU profile. If the partition key is wrong, scaling RU/s may only hide a hot partition. Good architecture designs the workload to spend RUs where business value exists.

Security

Security impact is indirect because a request unit does not grant access or encrypt data. Identity, keys, RBAC, private endpoints, network rules, and encryption controls handle that. RU risk appears when security or governance queries become expensive, when broad keys allow abusive workloads to burn throughput, or when throttling pushes teams to bypass controls. Least-privilege access helps ensure only approved apps can consume provisioned RU/s. Monitor unusual RU spikes by identity, region, or operation where possible. Protect account keys, prefer managed identity and RBAC for supported scenarios, and review diagnostic logs carefully because request metadata can contain sensitive resource names. during incidents.

Cost

Cost impact is direct. Cosmos DB throughput is billed around provisioned RU/s or serverless consumption, so inefficient operations become visible in the bill. A query that scans partitions, an overly broad index policy, or an oversized item can spend far more RUs than a point read. Autoscale can reduce manual planning effort, but its maximum setting still defines a cost envelope. Shared database throughput can save money for small containers, while dedicated throughput gives predictable capacity to critical workloads. FinOps reviews should connect RU consumption to features, tenants, regions, and data growth. Saving money usually means tuning queries, partition keys, indexing, and retention so RUs buy useful work.

Reliability

Reliability impact is direct because insufficient or poorly distributed RU capacity causes throttling, retries, and user-visible latency. Cosmos DB can retry transient throttling, but excessive 429 responses can still break application SLAs, especially under hot partitions or sudden traffic bursts. Reliable designs use correct partition keys, autoscale where demand is bursty, client retry policies, alerting on normalized RU consumption, and capacity tests before launches. Multi-region architectures also need to account for RU consumption during failover or regional traffic shifts. Do not rely only on total RU/s; one overloaded physical partition can hurt reliability while the account looks underused overall.

Performance

Performance impact is direct because RUs represent the capacity budget for reads, writes, and queries. When demand exceeds available RU/s, Cosmos DB throttles requests and clients rely on retry behavior. Adding RU/s can reduce throttling, but it does not fix bad partition keys, fan-out queries, oversized items, or inefficient indexing. Operators should inspect request charge, normalized RU consumption, throttled request metrics, and SDK diagnostics together. The best tuning decisions connect RU cost, partition-key distribution, index policy, and response time. More RU/s helps only when capacity is the real bottleneck; partition-aware tuning keeps useful capacity close to active data.

Operations

Operators inspect request units through throughput settings, normalized RU metrics, throttled request counts, query diagnostics, SDK request-charge headers, and cost reports. They compare provisioned RU/s with actual consumption, identify containers with chronic 429s, and review indexing policy or query patterns with developers. Azure CLI is useful for showing and updating database or container throughput, migrating between manual and autoscale, and exporting JSON evidence for review. Runbooks should define who can scale RU/s, when emergency scaling is allowed, how to roll back, and how to confirm the real problem is capacity rather than partitioning or query design. They also document baseline charges for key operations and review emergency scaling after incidents. That evidence helps separate workload growth from inefficient query design.

Common mistakes

  • Increasing RU/s repeatedly without checking query shape, partition-key skew, or indexing policy.
  • Comparing average account-level RU use while one physical partition is throttling a hot key.
  • Using shared database throughput for unrelated containers, then letting one noisy workload starve the others.
  • Ignoring RU charge in SDK diagnostics until a cost spike or 429 incident forces emergency tuning.