Containers Azure Container Apps premium

Minimum replicas

Minimum replicas is the lowest number of running replicas a scale controller or platform should maintain for a workload. In everyday Azure work, it appears when Container Apps, Kubernetes, or other scalable services need baseline capacity even when requests or queue depth drop. The useful mental model is the floor under autoscale, not the number of replicas you will always have during demand spikes. Treat it as an operating decision, not a loose label: identify the owner, scope, dependent workload, monitoring signal, and rollback path before changing it in production.

Aliases
Container Apps minimum replicas, min replicas, minReplicas, scale floor
Difficulty
intermediate
CLI mappings
4
Last verified
2026-05-16T06:31:43Z

Microsoft Learn

Microsoft Learn describes Minimum replicas as the minimum number of workload replicas that a scaling service keeps running even when demand is low. Teams use it to preserve baseline capacity for an app or service. Operators should verify scope, permissions, monitoring, and rollback evidence.

Microsoft Learn: Set scaling rules in Azure Container Apps2026-05-16T06:31:43Z

Technical context

Technically, Minimum replicas sits in the application hosting and autoscale plane across replicas, revisions, scale rules, health probes, traffic, and workload profiles. Azure represents it through min replica settings, revision configuration, autoscale rules, running replica counts, readiness state, and scale events. It usually depends on platform scale controller, workload profile or compute capacity, traffic patterns, cold-start tolerance, health checks, and budget. The important boundary is that minimum replicas define baseline availability; maximum replicas and scale rules still control growth under load.

Why it matters

Minimum replicas matters because it balances availability, cold-start avoidance, and cost when services should not scale all the way to zero. A weak definition causes teams to change the wrong setting, misread symptoms, or accept defaults that do not fit the workload. The value is not just the feature itself; it is the evidence around it. A strong page explains who owns it, which resource or workflow depends on it, how operators verify health, and what must happen before a production change. That shared understanding makes audits, migrations, scale events, and incidents less chaotic. This keeps owners, operators, and reviewers aligned on the same production evidence.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal, Minimum replicas appears on Container Apps scale settings, Kubernetes deployment views, replica counts, revision health, and metrics dashboards, where operators confirm state, ownership, and release evidence.

Signal 02

In CLI, SDK, REST, or diagnostic output, Minimum replicas appears as min replica values, running replica counts, scale rules, revision status, and workload profile details, helping teams compare live state with design.

Signal 03

In architecture, audit, or incident reviews, Minimum replicas appears when teams discuss autoscale design, cold-start tolerance, baseline cost, availability objectives, deployment rollout, and incident readiness, then decide which evidence proves health.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Keep baseline capacity running for latency-sensitive services.
  • Prevent scale-to-zero when warm instances are required.
  • Tune cost by setting the floor lower after measuring demand.
  • Validate replica counts during deployment and incident response.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Retail checkout warm capacity.

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

NorthBridge Retail ran a Container Apps checkout API that scaled to zero overnight, but morning flash sales produced slow first responses and abandoned carts.

Business/Technical Objectives
  • Keep checkout p95 latency under 350 ms at store opening.
  • Avoid running full peak capacity overnight.
  • Reduce cold-start complaints by 80%.
  • Keep release rollback evidence available for support.
Solution Using Minimum replicas

The platform team set minimum replicas to two for the checkout revision and kept maximum replicas aligned with the expected sale burst. They paired the change with HTTP scale rules, health probes, and Azure Monitor metrics for replica count and request latency. A deployment pipeline captured the app JSON before and after the change, and support staff documented when to lower the floor after seasonal events. The team tested revision swaps to ensure the warm replica setting followed the intended production revision. The implementation team captured before-and-after evidence, named the support owner, and added a rollback checkpoint so the change could be repeated safely during later releases.

Results & Business Impact
  • Morning p95 latency dropped from 1.8 seconds to 290 ms.
  • Cold-start tickets fell 91% during the next sale period.
  • Idle compute increased by 11%, within the approved seasonal budget.
  • Rollback validation completed in under ten minutes.
Key Takeaway for Glossary Readers

Minimum replicas are valuable when predictable responsiveness matters more than absolute scale-to-zero savings.

Case study 02

Clinical intake availability.

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Lakeway Clinics used a Container Apps service for digital intake forms, but scale-to-zero caused staff to wait during early-morning clinic setup.

Business/Technical Objectives
  • Keep the intake service ready before clinics open.
  • Control overnight cost without manual restarts.
  • Prove health through metrics instead of staff reports.
Solution Using Minimum replicas

Engineers configured one minimum replica for the intake service and retained event-driven scale rules for midday traffic. They reviewed workload profile capacity, added health checks, and used CLI output to confirm minReplicas on the active revision. The runbook explained how to temporarily raise the floor during vaccination campaigns and how to return to the normal setting afterward. Monitoring dashboards showed replica count, failed requests, and response time for each revision. The implementation team captured before-and-after evidence, named the support owner, and added a rollback checkpoint so the change could be repeated safely during later releases. They also documented validation commands, expected healthy signals, escalation contacts, and the operational decision that would trigger a rollback during production support.

Results & Business Impact
  • First-use delay fell from 42 seconds to under 4 seconds.
  • Clinic setup complaints dropped 87%.
  • Overnight cost stayed below the approved monthly threshold.
  • Support could verify readiness without logging into the portal.
Key Takeaway for Glossary Readers

A small replica floor can remove painful startup delays without abandoning autoscale economics.

Case study 03

Transit alert burst readiness.

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

MetroPath Transit published delay alerts through Container Apps, but event spikes after service disruptions sometimes hit a cold service and delayed notifications.

Business/Technical Objectives
  • Keep alert processing ready during rush hours.
  • Support event bursts without overprovisioning all day.
  • Reduce missed notification SLA events by 60%.
  • Document an evidence-based scaling pattern.
Solution Using Minimum replicas

The agency configured minimum replicas to three during weekday rush-hour deployments and used scheduled automation to restore the normal floor after service windows. Operators verified scale settings with Azure CLI and watched queue length, replica count, and processing duration. The solution also separated maximum replicas from the minimum floor, so emergencies could still scale higher. Release notes documented the approved scaling values, monitoring signals, and rollback process for each route-alert campaign. The implementation team captured before-and-after evidence, named the support owner, and added a rollback checkpoint so the change could be repeated safely during later releases. They also documented validation commands, expected healthy signals, escalation contacts, and the operational decision that would trigger a rollback during production support.

Results & Business Impact
  • Missed notification SLA events dropped 72%.
  • Average alert processing delay improved by 64%.
  • Idle replica spend rose only during approved rush-hour windows.
  • Incident review had clear CLI and metric evidence.
Key Takeaway for Glossary Readers

Minimum replicas turn autoscale from a guess into an intentional availability and cost decision.

Why use Azure CLI for this?

Azure CLI is useful for Minimum replicas because it turns portal state into repeatable evidence. Operators can inspect scope, identity, configuration, metrics, dependencies, and related resources before approving a change. CLI output also supports automation, audit packages, rollback reviews, and incident handoffs.

CLI use cases

  • Inventory Minimum replicas across the relevant resource, workspace, account, group, endpoint, or scope before a production review.
  • Inspect live Minimum replicas state during troubleshooting, migration planning, access review, release validation, or rollback confirmation.
  • Export JSON output so reviewers can compare actual configuration with architecture diagrams, source-controlled definitions, and approved runbooks.
  • Run read-only commands first; use create, update, or delete commands only through an approved change path.

Before you run CLI

  • Confirm tenant, subscription, resource group, workspace, account, namespace, server, endpoint, or policy scope before running commands.
  • Verify your role assignment allows the read, write, monitoring, data, or governance action you plan to perform.
  • Choose JSON, table, or TSV output intentionally so the result can be reviewed, scripted, or attached as evidence.
  • For production changes, confirm owner approval, maintenance window, rollback path, cost impact, and dependent workloads first.

What output tells you

  • Names, IDs, scopes, and regions confirm whether you are looking at the intended Minimum replicas boundary, not a similarly named test asset.
  • State, SKU, version, identity, network, metric, and configuration fields show whether live behavior matches the approved design.
  • Errors, timestamps, and provisioning states help separate service configuration issues from application, data, identity, or caller problems.
  • Saved output gives release, audit, and incident teams a shared record for comparison after the next change.

Mapped Azure CLI commands

Command bundle

az containerapp show --name <app> --resource-group <group>
az containerappdiscoverContainers
az containerapp update --name <app> --resource-group <group> --min-replicas <count>
az containerappconfigureContainers
az containerapp replica list --name <app> --resource-group <group> --revision <revision>
az containerapp replicadiscoverContainers
az monitor metrics list --resource <container-app-resource-id> --metric Replicas
az monitor metricsdiscoverContainers

Architecture context

Architecturally, Minimum replicas belongs to the application hosting and autoscale plane across replicas, revisions, scale rules, health probes, traffic, and workload profiles. It connects to platform scale controller, workload profile or compute capacity, traffic patterns, cold-start tolerance, health checks, and budget. Treat it as a production boundary with explicit ownership, dependencies, monitoring, and rollback evidence. A diagram or runbook should show who can change it, what resources rely on it, and which outputs prove the intended configuration.

Security

Security for Minimum replicas focuses on baseline replicas that expose endpoints, identities, secrets, and network paths even during low demand. The main risk is treating it as harmless configuration while it may affect access, exposure, data handling, or automated response. Review who can read, create, update, delete, invoke, or bypass the related resource, and whether that permission is direct, inherited, or granted through a deployment pipeline. Prefer managed identity, least privilege, private access, encryption, monitored changes, and clear exception ownership wherever the Azure service supports those controls. Keep evidence in the change record. This keeps owners, operators, and reviewers aligned on the same production evidence.

Cost

Cost for Minimum replicas is driven by always-on compute, memory, CPU, workload profile usage, idle capacity, and savings lost when the floor is set too high. Some costs are direct, such as compute, storage, ingestion, action execution, capacity, or retained data. Other costs are indirect: failed retries, duplicated work, noisy alerts, unused resources, delayed migrations, or engineering time spent troubleshooting unclear ownership. Early FinOps reviews should identify who pays, which metric or SKU drives the bill, and whether a cheaper setting still meets security, reliability, compliance, and performance requirements. Do not cut cost by removing evidence or weakening controls silently.

Reliability

Reliability for Minimum replicas depends on whether enough instances stay warm to absorb routine traffic, probe failures, dependency recovery, and deployment transitions. The concern is not only that the setting exists; it is whether the workload behaves predictably during deployment, scale, maintenance, dependency loss, retry, recovery, and operator error. Production teams should know which metric, log, activity record, or CLI output proves healthy behavior. They should also document what failure looks like, how to roll back, and which dependent services must be checked before the incident is closed. Good reliability practice makes the term operational, not decorative. This keeps owners, operators, and reviewers aligned on the same production evidence.

Performance

Performance for Minimum replicas depends on cold-start time, request latency, queue drain speed, readiness delay, concurrency, and warm capacity during sudden demand. The right signal may be request latency, queue depth, startup time, query duration, chart responsiveness, job runtime, throughput, alert delay, or operator time to isolate a bottleneck. Measure before and after important changes rather than assuming the setting improves speed. Keep enough metrics, logs, and command output to explain whether Azure configuration helped the workload, hid the problem, or simply moved the bottleneck to another component. This keeps owners, operators, and reviewers aligned on the same production evidence.

Operations

Operationally, Minimum replicas requires checking scale settings, current replica counts, revision health, traffic split, logs, and scale-event history. Operators should know which portal blade, CLI command, SDK property, metric, activity log, deployment output, or runbook step shows the live state. Avoid undocumented portal-only edits in production. Use scripts, tags, source-controlled definitions, diagnostics, and change records so support staff can compare actual configuration with the approved design during releases, audits, and incidents. After any change, capture evidence, confirm dependent workloads still behave correctly, and record the owner responsible for follow-up. This keeps owners, operators, and reviewers aligned on the same production evidence.

Common mistakes

  • Changing Minimum replicas without checking dependent resources, owner approval, monitoring signals, and rollback steps first.
  • Assuming a portal label tells the whole story instead of validating live state through CLI, logs, diagnostics, or activity history.
  • Granting broad permissions for convenience when a narrower role, managed identity, group assignment, or read-only path would work.
  • Optimizing cost or speed while ignoring security, reliability, data exposure, recovery behavior, or user-facing impact.