Memory scale rule - Azure Glossary

Microsoft Learn

A memory scale rule is a Container Apps scale rule that uses memory utilization to add or remove running replicas within configured limits. Teams use it when a container workload becomes memory-bound before HTTP traffic, CPU, or queue length fully explains pressure. In plain English, it gives operators a named control for pressure-based scaling for memory-sensitive services while keeping replica bounds visible instead of leaving the decision hidden in a portal setting, script, or deployment file. Treat it as production-ready only when the owner, dependencies, permission boundary, monitoring signal, and rollback evidence are clear.

Microsoft Learn: Set scaling rules in Azure Container Apps2026-05-16T05:14:53Z

Technical context

Technically, a memory scale rule sits in the Azure Container Apps scale configuration powered by KEDA and Azure Monitor metrics. Azure represents it through memory scale-rule metadata, minimum replicas, maximum replicas, revision templates, and container memory settings. It usually interacts with Container Apps revisions, containers, workload profiles, memory limits, metrics, health probes, and downstream dependencies. The key boundary is that memory scaling requires running replicas for measurement, so it is not a replacement for event-driven scale-to-zero patterns. Architects should document scope, identity path, network assumptions, deployment method, monitoring hooks, and fallback behavior before production use.

Why it matters

A memory scale rule matters because it makes pressure-based scaling for memory-sensitive services while keeping replica bounds visible, testable, and owned. Without that clarity, teams can change the wrong scope, miss hidden dependencies, or troubleshoot symptoms caused by configuration drift rather than application code. It also gives reviewers a common language for security, reliability, operations, cost, and performance decisions. A good implementation states who owns the setting, what workload depends on it, how changes are approved, and which metric or log proves the result. That keeps audits, migrations, incidents, and release reviews from becoming guesswork. Keep the decision visible in runbooks, diagrams, tags, and support notes.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal, a memory scale rule appears in configuration, monitoring, or access views where teams verify ownership, dependencies, permissions, readiness, and rollback evidence before changes.

Signal 02

In CLI, IaC, or query output, a memory scale rule appears as properties, status, scope, and dependency evidence that operators compare with the approved design during reviews.

Signal 03

In architecture reviews, a memory scale rule appears when teams discuss ownership, access, reliability, cost, performance, and evidence needed to prove the design is safe during reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Use Memory scale rule to make ownership, configuration evidence, monitoring, and rollback behavior explicit.
Review Memory scale rule during design reviews, release readiness checks, incident response, and post-change validation.
Document Memory scale rule with related identities, network paths, policies, cost drivers, and operational runbooks.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Image processor memory autoscale

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

RecipeFlow Labs, a food-tech SaaS organization, ran image-processing containers that hit memory pressure before CPU or request count triggered scale-out. The team used a memory scale rule to create a controlled Azure pattern with clear ownership, measurable evidence, and safer production handoff.

Business/Technical Objectives

Reduce out-of-memory restarts by 80%.
Keep image processing under two minutes.
Limit replicas to storage throughput capacity.
Document scale-to-zero tradeoffs.

Solution Using Memory scale rule

Engineers configured a memory scale rule at a utilization threshold validated by load tests. They set minimum replicas to one because memory scaling needs a running replica, and maximum replicas matched Blob Storage throughput and queue worker limits. Azure Monitor tracked working set, replica count, and restart count. Operators used CLI to verify the scale template after every revision deployment. Runbooks captured owners, approval evidence, monitoring signals, and rollback steps so support teams could repeat the pattern without guessing during incidents. The design also included CLI validation, activity-log review, and architecture notes that connected the Azure configuration to business accountability.

Results & Business Impact

Out-of-memory restarts dropped 88%.
Average processing time fell to 74 seconds.
Storage throttling stayed below alert thresholds.
Runbooks clearly explained why the app could not scale to zero.

Key Takeaway for Glossary Readers

A memory scale rule is practical for memory-bound containers when replica bounds and resource limits are designed together.

Case study 02

Claim document extraction scaling

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

MetroClaims AI, an insurance automation organization, had a document extraction service slowing down when large claim PDFs arrived in bursts. The team used a memory scale rule to create a controlled Azure pattern with clear ownership, measurable evidence, and safer production handoff.

Business/Technical Objectives

Keep extraction latency below five minutes.
Prevent container restarts on large PDFs.
Avoid unrestricted replica growth.
Improve incident evidence for support.

Solution Using Memory scale rule

The platform team measured memory usage for small, medium, and large PDF batches. A memory scale rule was added with max replicas capped by the downstream database and AI service quota. Request tracing linked each extraction job to replica count and memory metrics. When memory rose but replicas did not increase, operators could check revision scale configuration and metric emission before blaming the model code. Runbooks captured owners, approval evidence, monitoring signals, and rollback steps so support teams could repeat the pattern without guessing during incidents. The design also included CLI validation, activity-log review, and architecture notes that connected the Azure configuration to business accountability.

Results & Business Impact

Large-PDF extraction latency dropped 61%.
Container restarts fell from 31 per week to 3.
Replica count stayed within downstream quota limits.
Support triage time decreased 44%.

Key Takeaway for Glossary Readers

Memory scale rule gives bursty document workloads a direct pressure signal when request volume alone misses the real bottleneck.

Case study 03

Route optimization worker scaling

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HarborOps Analytics, a maritime logistics organization, processed vessel route optimizations in containers that consumed memory unevenly depending on route complexity. The team used a memory scale rule to create a controlled Azure pattern with clear ownership, measurable evidence, and safer production handoff.

Business/Technical Objectives

Complete optimization batches before port cutoffs.
Reduce manual scale changes.
Keep monthly compute spend predictable.
Avoid failed readiness probes.

Solution Using Memory scale rule

Architects configured memory scale rules for the optimization worker and set resource requests that reflected real route complexity. Minimum replicas kept the metric available, while maximum replicas protected the shared Container Apps environment. Operators watched memory metrics, readiness probe failures, and batch duration. A change-control checklist required CLI output showing the memory rule, min/max replicas, and current revision before production release. Runbooks captured owners, approval evidence, monitoring signals, and rollback steps so support teams could repeat the pattern without guessing during incidents. The design also included CLI validation, activity-log review, and architecture notes that connected the Azure configuration to business accountability. After release, the platform team reviewed metrics weekly and kept the implementation aligned with security, reliability, and cost expectations.

Results & Business Impact

Batch completion met port cutoff 97% of the time.
Manual scale changes dropped 73%.
Compute spend stayed 11% below forecast.
Readiness probe failures fell by 52%.

Key Takeaway for Glossary Readers

A memory scale rule is strongest when teams pair it with workload-specific resource sizing and clear replica limits.

Why use Azure CLI for this?

Azure CLI is useful for a memory scale rule because it turns the live configuration into repeatable evidence. Operators can inventory scope, compare settings with IaC, confirm identity and network assumptions, and export facts for change reviews or incidents without relying on screenshots.

CLI use cases

Inventory Memory scale rule settings across subscriptions or resource groups before reviews, migrations, and ownership cleanup.
Inspect live Memory scale rule configuration before a release, audit, incident, rollback, or support handoff.
Export Memory scale rule evidence so teams can compare portal state, IaC intent, activity logs, and monitoring results.

Before you run CLI

Confirm tenant, subscription, resource group, scope, and service-specific permissions before inspecting or changing Memory scale rule.
Know whether the command is read-only or changes production behavior, cost, routing, identity, or network exposure.
Choose JSON, table, or TSV output deliberately so the result can be reviewed, scripted, or attached to evidence.

What output tells you

The output shows whether a memory scale rule exists, where it is scoped, and which resource or workload currently owns it.
Status, identity, network, SKU, policy, metric, or dependency fields reveal whether live configuration matches the intended design.
Repeated output over time can prove drift, confirm remediation, or show that a change reached the correct Azure resource.

Mapped Azure CLI commands

Memory scale rule Azure CLI checks

az containerapp show --resource-group <group> --name <app> --query properties.template.scale

az containerappdiscoverContainers

az containerapp update --resource-group <group> --name <app> --scale-rule-name mem --scale-rule-type memory --scale-rule-metadata type=Utilization value=75

az containerappconfigureContainers

az containerapp replica list --resource-group <group> --name <app> --revision <revision>

az containerapp replicadiscoverContainers

az monitor metrics list --resource <container-app-id> --metric WorkingSetBytes,Replicas

az monitor metricsdiscoverContainers

Architecture context

Technically, a memory scale rule sits in the Azure Container Apps scale configuration powered by KEDA and Azure Monitor metrics. Azure represents it through memory scale-rule metadata, minimum replicas, maximum replicas, revision templates, and container memory settings. It usually interacts with Container Apps revisions, containers, workload profiles, memory limits, metrics, health probes, and downstream dependencies. The key boundary is that memory scaling requires running replicas for measurement, so it is not a replacement for event-driven scale-to-zero patterns. Architects should document scope, identity path, network assumptions, deployment method, monitoring hooks, and fallback behavior before production use.

Security

Security for Memory scale rule starts with least privilege and clear ownership. The main risk is scaling memory-bound containers without limiting each replica’s secrets, outbound calls, and downstream data access. Review who can create, update, delete, assign, invoke, or read it, and whether access comes from direct roles, inherited roles, managed identities, secrets, or deployment pipelines. Prefer managed identity, scoped RBAC, private access, encryption, and logged approvals when the service supports them. For production, keep evidence of permission scope, network exposure, diagnostic logging, and rollback authority so a security review can verify live state rather than trusting documentation alone.

Cost

Cost for Memory scale rule is driven by replica count, workload profile size, memory allocation, idle minimum replicas, logging, and downstream service consumption. The spend may be direct, such as SKU, capacity, storage, throughput, replicas, retention, or network transfer, or indirect through support time and failed changes. FinOps reviews should identify the owner, billing tag, usage metric, and cheaper configuration that still meets the workload requirement. Do not reduce cost by weakening security, durability, compliance, or recovery needs without written approval. Track changes over time so teams can distinguish intentional scaling from forgotten resources, stale test deployments, and inefficient defaults.

Reliability

Reliability for a memory scale rule depends on replica scale-out, memory pressure, restart count, health probes, backlog, and downstream capacity. Operators should know what happens during deployment, scale changes, failover, maintenance, dependency loss, and operator error. Some effects are direct, such as availability, recovery, throughput, or dead-letter behavior; others are indirect because the setting makes drift easier to detect and reverse. Document region assumptions, backups, health probes, retry behavior, dependency limits, and rollback steps. A reliable implementation lets support teams prove current state quickly before making emergency changes. Keep the decision visible in runbooks, diagrams, tags, and support notes. Review the evidence again after deployment so drift is caught early.

Performance

Performance for a memory scale rule depends on memory utilization, replica count, scale latency, request latency, garbage collection, OOM events, and backlog growth. The effect may appear as latency, throughput, IOPS, connection wait time, replica behavior, query duration, pipeline runtime, or faster operational troubleshooting. Measure before and after important changes instead of assuming the setting helps. Useful evidence includes metrics, logs, traces, activity records, deployment output, load-test results, and user-impact signals. When performance is indirect, state that clearly and focus on how the term improves diagnosis speed, configuration consistency, or workload routing. Keep the decision visible in runbooks, diagrams, tags, and support notes.

Operations

Operationally, a memory scale rule needs a repeatable inspection path. Teams should know which portal blade, CLI command, Resource Graph query, metric, activity log, workbook, or deployment artifact shows the live state. Runbooks should describe normal ownership, approved change windows, escalation contacts, rollback steps, and evidence to capture after changes. Avoid undocumented portal-only edits in production. Use IaC, tags, CLI exports, and monitoring so operators can compare actual configuration with the intended design during releases, incidents, and audits. Keep the decision visible in runbooks, diagrams, tags, and support notes. Review the evidence again after deployment so drift is caught early. Tie every change to an owner, monitoring signal, and rollback path.

Common mistakes

Changing a memory scale rule without checking dependent resources, owner tags, alerts, permissions, and rollback steps first.
Assuming the portal label is complete instead of validating live state through CLI, IaC, metrics, or activity logs.
Granting broad permissions for convenience, then forgetting to remove temporary access after troubleshooting or deployment.