Databases Azure Cosmos DB premium

Autoscale throughput

Autoscale throughput is Azure Cosmos DB’s way to adjust request-unit capacity for a database or container as traffic changes. In plain terms, you set a maximum RU/s value, and Cosmos DB scales within that range instead of forcing you to pick one fixed manual throughput number. It is useful for workloads with bursts, uneven schedules, or uncertain growth. It does not remove the need for good partition keys, query efficiency, indexing strategy, or cost monitoring. The maximum setting is the guardrail.

Aliases
Cosmos DB autoscale throughput, autoscale RU/s, autoscale provisioned throughput, max RU/s
Difficulty
intermediate
CLI mappings
4
Last verified
2026-05-10T00:00:00Z

Microsoft Learn

Autoscale throughput is Azure Cosmos DB’s way to adjust request-unit capacity for a database or container as traffic changes. Microsoft Learn places it in Create Azure Cosmos DB containers and databases with autoscale; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Create Azure Cosmos DB containers and databases with autoscale2026-05-10T00:00:00Z

Technical context

Technically, autoscale provisioned throughput can be configured at the database or container level for supported Cosmos DB APIs. The resource scales RU/s based on workload demand up to the configured maximum throughput, with a minimum derived from that maximum and service rules. The most active partition can drive scaling behavior, so nonuniform partition access matters. CLI and ARM options expose max-throughput settings for creation and updates. Operators should review partition key design, indexing policy, consistency, regions, availability, and request charge before blaming capacity alone.

Why it matters

Autoscale throughput matters because Cosmos DB workloads often have peaks that are hard to predict. Manual RU/s can be wasteful when set for peak traffic and risky when set for average traffic. Autoscale gives teams a middle path: capacity can rise for bursts while the maximum keeps spend bounded. It is valuable for SaaS tenants, retail events, scheduled imports, and user-facing apps with uneven usage. The danger is assuming autoscale fixes bad data access. Hot partitions, expensive queries, and poor indexing can still drive scale-ups and throttling. Good design controls both demand and capacity. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

You see autoscale throughput in Cosmos DB database or container settings where maximum RU/s defines the capacity ceiling. during governance review and incident response.

Signal 02

It appears in provisioning scripts that use max-throughput instead of fixed manual throughput for variable workloads. during governance review and incident response. during release readiness, operations review, and troubleshooting.

Signal 03

It shows up in metrics when normalized RU consumption, throttled requests, or hot partitions explain performance and cost behavior. during governance review and incident response.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Handle variable SaaS, retail, gaming, or IoT workloads without constant manual RU changes.
  • Bound maximum spend while giving Cosmos DB room to absorb expected bursts.
  • Provision development and production containers with repeatable autoscale settings.
  • Reduce throttling during scheduled imports, campaigns, or unpredictable user activity.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Autoscale throughput in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

OrbitMart Commerce used Cosmos DB for shopping carts and saw traffic swing dramatically during influencer-driven flash sales.

Business/Technical Objectives
  • Reduce cart throttling during sudden demand spikes.
  • Avoid fixed peak RU/s cost on normal days.
  • Keep checkout latency under 200 milliseconds.
  • Identify hot partition risks before launch.
Solution Using Autoscale throughput

The architecture team moved the cart container from manual throughput to autoscale throughput with a maximum RU/s sized from flash-sale load tests. Partition key distribution was reviewed using synthetic tenant and cart traffic, and client SDK retry policies were tuned for occasional 429 responses. Azure CLI scripts created containers with max-throughput settings and exported throughput configuration for release records. Dashboards tracked normalized RU consumption, request charge, 429s, and p95 latency. The team also rewrote two cross-partition queries before raising the maximum, avoiding capacity spend that would have masked bad query design. The team also documented owners, review cadence, rollback steps, acceptance criteria, and the evidence operators should collect during the next production review.

Results & Business Impact
  • Cart-related 429 responses fell by 88% during the next flash sale.
  • Average monthly throughput cost dropped 31% versus fixed peak provisioning.
  • Checkout p95 latency stayed below 180 milliseconds.
  • Two hot-query patterns were fixed before launch.
Key Takeaway for Glossary Readers

Autoscale throughput works best when capacity settings are paired with partition and query discipline.

Case study 02

Autoscale throughput in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

CivicLedger SaaS hosted municipal records for small towns and large cities in the same Cosmos DB account, causing uneven daily usage.

Business/Technical Objectives
  • Give variable tenants enough RU/s during business hours.
  • Bound spend with explicit maximum throughput.
  • Separate noisy tenants from shared workloads.
  • Improve 429 incident diagnosis.
Solution Using Autoscale throughput

Engineers reviewed request charge by tenant, partition key, and operation type. Small municipalities remained in shared database autoscale throughput, while the largest city containers received dedicated autoscale throughput with higher maximums. Azure CLI commands documented database and container throughput settings in the deployment pipeline. Alerting distinguished overall RU pressure from hot partition behavior. The application added tenant-aware backoff and dashboards showing which customers drove peak consumption. Monthly reviews adjusted maximum RU/s based on measured growth rather than guesswork. The team also documented owners, review cadence, rollback steps, acceptance criteria, and the evidence operators should collect during the next production review.

Results & Business Impact
  • Tenant-related throttling incidents dropped from nine per month to two.
  • Large-city workloads stopped consuming the shared pool during daytime peaks.
  • Cosmos DB spend stayed within 6% of forecast for the quarter.
  • Incident diagnosis time for 429 spikes fell from 70 minutes to 20 minutes.
Key Takeaway for Glossary Readers

Autoscale throughput is a capacity tool, but SaaS reliability still depends on tenant isolation and partition visibility.

Case study 03

Autoscale throughput in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HelioGrid Renewables stored IoT device state in Cosmos DB and experienced unpredictable bursts after solar gateways reconnected.

Business/Technical Objectives
  • Absorb reconnect bursts without permanent overprovisioning.
  • Keep device-state reads fast for operations dashboards.
  • Detect whether bursts came from hot partitions.
  • Control maximum RU/s spend.
Solution Using Autoscale throughput

The platform team configured autoscale throughput on the device-state container and chose a maximum based on gateway reconnect simulations. They redesigned the partition key to spread writes by site and device group, then adjusted indexing to remove unused paths. Azure CLI commands showed throughput settings during release review, while Azure Monitor tracked normalized RU consumption, request charge, 429s, and latency. When a gateway firmware issue caused repeated writes, autoscale absorbed the burst long enough for operators to identify the source and deploy a client-side backoff fix. The team also documented owners, review cadence, rollback steps, acceptance criteria, and the evidence operators should collect during the next production review. Those notes made the pattern reusable for adjacent teams without turning the case into a one-off hero effort.

Results & Business Impact
  • Reconnect-window throttling dropped by 79%.
  • Operations dashboard p95 read latency stayed below 95 milliseconds.
  • Indexing changes reduced write request charge by 23%.
  • The maximum RU/s ceiling prevented the firmware bug from exceeding budget limits.
Key Takeaway for Glossary Readers

Autoscale throughput gives Cosmos DB burst capacity, but efficient partitioning and client behavior determine whether that capacity is well spent.

Why use Azure CLI for this?

Azure CLI is useful for autoscale throughput because capacity choices must be explicit and repeatable. Use CLI commands to create containers or databases with max throughput, show current throughput settings, and update limits during controlled operations. CLI output also supports cost and reliability reviews by proving whether a workload uses manual or autoscale provisioning. Pair capacity changes with metrics, partition analysis, and query review; increasing max RU/s may reduce throttling, but it may also hide inefficient access patterns.

CLI use cases

  • Create a Cosmos DB SQL container with autoscale max throughput during environment provisioning.
  • Show current throughput settings to confirm whether a database or container uses autoscale.
  • Update autoscale maximum RU/s after a measured traffic increase or cost review.
  • Export account, database, and container metadata before investigating throttling or hot partitions.

Before you run CLI

  • Confirm whether throughput is configured at the database or container level.
  • Choose a maximum RU/s based on peak demand, budget, and minimum throughput implications.
  • Review partition key distribution, indexing policy, and request-charge metrics before raising capacity.
  • Make sure the change is approved because max RU/s defines the spend ceiling.

What output tells you

  • Throughput output shows whether autoscale is enabled and what maximum RU/s is configured.
  • Container output confirms partition key path, indexing policy references, and resource identity.
  • Metrics reveal whether throttling comes from overall capacity, hot partitions, or expensive operations.
  • A higher max RU/s with persistent 429s suggests partition or query design needs deeper review.

Mapped Azure CLI commands

Azure Cosmos DB operations

direct
az cosmosdb list --resource-group <resource-group>
az cosmosdbdiscoverDatabases
az cosmosdb show --name <account-name> --resource-group <resource-group>
az cosmosdbdiscoverDatabases
az cosmosdb create --name <account-name> --resource-group <resource-group>
az cosmosdbprovisionDatabases
az cosmosdb sql database list --account-name <account-name> --resource-group <resource-group>
az cosmosdb sql databasediscoverDatabases
az cosmosdb delete --name <account-name> --resource-group <resource-group>
az cosmosdbremoveDatabases

Cosmos operations

direct
az cosmosdb show --name <account> --resource-group <resource-group>
az cosmosdbdiscoverDatabases
az cosmosdb create --name <account> --resource-group <resource-group> --locations regionName=<region>
az cosmosdbprovisionDatabases
az cosmosdb update --name <account> --resource-group <resource-group> --enable-automatic-failover true
az cosmosdbconfigureDatabases
az cosmosdb keys list --name <account> --resource-group <resource-group>
az cosmosdb keysdiscoverDatabases
az cosmosdb sql database list --account-name <account> --resource-group <resource-group>
az cosmosdb sql databasediscoverDatabases
az cosmosdb sql database create --account-name <account> --resource-group <resource-group> --name <database>
az cosmosdb sql databaseprovisionDatabases
az cosmosdb sql container list --account-name <account> --resource-group <resource-group> --database-name <database>
az cosmosdb sql containerdiscoverDatabases
az cosmosdb sql container create --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --partition-key-path <path>
az cosmosdb sql containerprovisionDatabases
az cosmosdb sql container throughput show --account-name <account> --resource-group <resource-group> --database-name <database> --name <container>
az cosmosdb sql container throughputdiscoverDatabases
az cosmosdb sql container throughput update --account-name <account> --resource-group <resource-group> --database-name <database> --name <container> --throughput <ru>
az cosmosdb sql container throughputconfigureDatabases

Architecture context

Technically, autoscale provisioned throughput can be configured at the database or container level for supported Cosmos DB APIs. The resource scales RU/s based on workload demand up to the configured maximum throughput, with a minimum derived from that maximum and service rules. The most active partition can drive scaling behavior, so nonuniform partition access matters. CLI and ARM options expose max-throughput settings for creation and updates. Operators should review partition key design, indexing policy, consistency, regions, availability, and request charge before blaming capacity alone.

Security

Security is not provided by autoscale itself, but capacity settings can affect operational risk. A compromised client, leaked key, or abusive tenant can generate enough requests to drive RU/s toward the maximum. Use managed identities where supported, role-based access, key rotation, private endpoints, firewall rules, and tenant-level throttling. Monitor unusual request volume by region, partition key, operation type, and identity. Sensitive workloads also need backup, encryption, diagnostic logs, and data governance. Autoscale keeps the database responsive under demand; it does not decide whether that demand is authorized or safe. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Cost

Cost control is one of the main reasons to use autoscale throughput, but the maximum RU/s still matters. Autoscale can cost less than manual peak provisioning for variable workloads, while still supporting bursts. If a workload frequently reaches the maximum, the team may be paying near peak anyway and should revisit design or baseline capacity. Hot partitions and inefficient queries can cause unnecessary scale-ups. Cost reviews should include request charge by operation, query shape, indexing policy, partition key distribution, regions, and storage. The right setting balances user experience, throttling tolerance, and budget. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Reliability

Reliability improves when Cosmos DB can absorb traffic variation without manual intervention. Autoscale helps reduce 429 throttling during bursts and keeps applications healthier when demand changes quickly. It still depends on retry logic, partition distribution, regional failover, consistency choices, and downstream capacity. Hot partitions can throttle even when the overall maximum looks generous. Operators should monitor normalized RU consumption, throttled requests, request charge, availability, replication lag where relevant, and client retry behavior. The best reliability pattern combines autoscale throughput with good partition keys, efficient queries, and graceful backoff. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Performance

Performance depends on available RU/s, partition distribution, query design, indexing, consistency, and client behavior. Autoscale throughput can raise capacity during demand spikes, reducing throttling and preserving latency. It cannot make cross-partition scans cheap, fix hot keys, or remove the cost of heavy indexing. Applications should use efficient point reads, bounded queries, continuation handling, and proper retry policies. Monitor p95 latency, request charge, normalized RU consumption, 429s, and per-partition signals where available. If performance remains poor at high RU/s, the data model or query pattern needs attention. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Operations

Operationally, autoscale throughput needs clear ownership and review cadence. Decide whether throughput belongs at database or container level, set maximum RU/s intentionally, and document why that limit fits the workload. Operators should review metrics for normalized RU consumption, 429 responses, hot partitions, storage growth, and regional traffic. During incidents, check whether throttling is caused by insufficient max RU/s, a hot partition, an expensive query, or a client retry storm. Capacity changes should be made through infrastructure as code or scripted commands so audit history and rollback are clear. The safest teams document the owner, expected signal, rollout boundary, and rollback path for Autoscale throughput before production use.

Common mistakes

  • Assuming autoscale throughput fixes a poor partition key or hot tenant pattern.
  • Setting max RU/s without understanding the minimum and cost behavior.
  • Putting many noisy containers in shared database throughput without tenant or workload isolation.
  • Increasing capacity before checking request charge, indexing, and client retry storms.