DevOps Deployment workflows premium

Canary deployment

Canary deployment is a release approach that sends a small portion of real traffic to a new version before rolling it out broadly. In Azure, teams implement it with Container Apps revisions, AKS service mesh routing, App Service slots, Front Door rules, or application gateways depending on the workload. In plain English, it lets the team test production behavior with limited blast radius. If metrics look good, traffic increases gradually; if errors appear, traffic moves back to the previous stable version.

Back to glossary browser Open Microsoft Learn source

Aliases: No aliases mapped yet
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-12

Browse trail Learn DevOps Deployment workflows Canary deployment

Learning map Graph DevOps concept cluster Canary deployment

Context Concept cluster: DevOps concept cluster

Microsoft Learn

Canary deployment is a release approach that sends a small portion of real traffic to a new version before rolling it out broadly. Microsoft Learn places it in Traffic splitting in Azure Container Apps; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Traffic splitting in Azure Container Apps2026-05-12

Technical context

Technically, a canary deployment uses traffic weights, revision labels, slots, routing rules, feature flags, or service mesh policies to expose a new version to selected users or a small percentage of requests. Azure Container Apps supports traffic splitting across active revisions, while other Azure patterns use ingress controllers, Front Door, App Service deployment slots, or release pipelines. Operators inspect traffic percentages, version labels, health probes, logs, metrics, error rates, latency, rollback commands, and user-impact dashboards before increasing exposure.

Why it matters

Canary deployment matters because production behavior is never fully proven in a test environment. Limited traffic gives teams evidence from real clients, data, dependencies, and scale while keeping rollback simple. It reduces the risk of bad releases, schema mismatches, dependency failures, latency regressions, and feature surprises. The business benefit is faster delivery with controlled risk: customers see fewer major outages, and engineering gets clearer signals before full rollout. A canary must still be planned. Without success criteria, telemetry, rollback authority, and traffic controls, it becomes a slow full release rather than a safety mechanism. That evidence keeps accountability clear. That evidence keeps accountability clear.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Container Apps, canary deployment appears as active revisions, revision labels, traffic weights, ingress settings, and gradual promotion of a new image. for governance and incident response.

Signal 02

In AKS or gateway patterns, it appears as routing rules, service mesh policies, header matches, weighted backends, health probes, and rollout manifests. for governance and incident response.

Signal 03

In release operations, canary evidence appears beside deployment IDs, metric gates, error dashboards, rollback records, customer-impact notes, and approval history. for governance and incident response.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Set or verify traffic weights between stable and canary versions.
Capture revision, slot, or route configuration before promoting traffic.
Rollback traffic to the previous stable version after a failed health gate.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Canary deployment for payments API

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Fabrikam Pay, a payment processor, needed to release a new fraud-scoring API without risking all checkout transactions if model latency regressed.

Business/Technical Objectives

Expose the new version to 5 percent of production traffic first
Compare fraud decisions, latency, and error rate against stable
Rollback in under five minutes if health gates failed
Promote gradually only after approved metrics

Solution Using Canary deployment

The platform team deployed the new API version as an Azure Container Apps revision and used traffic splitting to send 5 percent of checkout requests to the canary. Application Insights compared stable and canary latency, dependency failures, fraud-score distribution, and checkout errors. Release managers defined promotion steps of 5, 25, 50, and 100 percent, with a rollback command documented in the runbook. Security verified managed identity, secrets, and WAF rules before traffic exposure. The release channel received automated metric summaries after each observation window. The team reviewed results with application owners before closing the change record. Support notes documented rollback ownership and business approval.

Results & Business Impact

A latency regression was caught at 5 percent traffic
Rollback completed in 3 minutes without broad customer impact
Checkout error rate remained within the approved threshold
The corrected release reached 100 percent traffic two days later

Key Takeaway for Glossary Readers

Canary deployment limits release blast radius by making production traffic exposure gradual, measurable, and reversible.

Case study 02

Canary deployment for hospital scheduling

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Alpine Care Network, a hospital group, updated its patient scheduling application and needed to validate real appointment workflows before all clinics used the new version.

Business/Technical Objectives

Route one clinic group to the canary version first
Monitor appointment creation, cancellations, and page latency
Preserve stable-version access for other clinics
Roll back instantly if scheduling failures increased

Solution Using Canary deployment

Engineers used App Service deployment slots with routing rules and a clinic-based feature flag to direct limited traffic to the new scheduling build. The canary slot had the same managed identity, Key Vault references, private endpoints, and diagnostic settings as production. Operators monitored appointment success rate, dependency timing, SQL errors, and help-desk tickets during the clinic pilot. The release plan required database backward compatibility so stable and canary versions could run together. After two clean observation windows, traffic expanded to additional clinic groups. The team reviewed results with application owners before closing the change record. Support notes documented rollback ownership and business approval.

Results & Business Impact

Pilot clinics completed 4,200 appointments without elevated errors
Scheduling page P95 latency improved by 18 percent
No database compatibility incidents occurred during parallel operation
Full rollout completed with zero emergency rollback events

Key Takeaway for Glossary Readers

Canary deployment is especially useful when real users and business workflows are the best proof that a release is safe.

Case study 03

Canary deployment for manufacturing telemetry service

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

AxisForge Manufacturing, an industrial equipment company, released a new telemetry ingestion service for factory devices and needed to avoid losing high-volume machine data.

Business/Technical Objectives

Send 10 percent of device traffic to the new ingestion path
Detect data loss, duplicate events, and processing latency
Keep stable ingestion available during the experiment
Clean up old revisions after successful rollout

Solution Using Canary deployment

The engineering team deployed a new AKS service version behind an Istio routing policy that sent a small percentage of device traffic to canary pods. Event Hubs metrics, application logs, and downstream database counts compared canary and stable paths. Operators used dashboards for ingest rate, duplicate-event count, dead-letter messages, pod saturation, and device-region distribution. The rollout plan included staged traffic increases and a rollback rule that returned all traffic to stable if event loss exceeded the threshold. After approval, old pods and temporary telemetry dashboards were removed. The team reviewed results with application owners before closing the change record.

Results & Business Impact

Duplicate-event bug was found before full rollout
Event loss stayed below the 0.01 percent threshold
Canary promotion completed across three factories in one week
Old revision cleanup reduced temporary compute cost by 22 percent

Key Takeaway for Glossary Readers

Canary deployment protects critical event pipelines by testing the new path under real load before the whole fleet depends on it.

Why use Azure CLI for this?

Use CLI, deployment manifests, and metrics for canary deployment because traffic percentages, version identifiers, and rollback state must be captured exactly.

CLI use cases

Set or verify traffic weights between stable and canary versions.
Capture revision, slot, or route configuration before promoting traffic.
Rollback traffic to the previous stable version after a failed health gate.

Before you run CLI

Confirm tenant, subscription, scope, resource group, region, and environment before collecting or changing production evidence.
Use least-privileged access and avoid exposing keys, tokens, personal data, billing details, or confidential topology in output.
Know whether the command is read-only, mutating, cost-impacting, or security-impacting before running it in production.

What output tells you

Output confirms whether the live configuration exists at the expected Azure scope and matches the approved design.
Returned properties, metrics, or logs help separate healthy service behavior from drift, missing configuration, or workload symptoms.
Differences between environments provide evidence for rollback, tuning, support escalation, audit review, or owner follow-up.

Mapped Azure CLI commands

Containerapp operations

direct

az containerapp list --resource-group <resource-group>

az containerappdiscoverContainers

az containerapp show --name <container-app> --resource-group <resource-group>

az containerappdiscoverContainers

az containerapp up --name <container-app> --image <image>

az containerappprovisionContainers

az containerapp update --name <container-app> --resource-group <resource-group> --image <image>

az containerappconfigureContainers

az containerapp logs show --name <container-app> --resource-group <resource-group>

az containerapp logsdiscoverContainers

Architecture context

Security

Security for canary deployment focuses on ensuring the new version has the same or stronger controls as production before any user traffic reaches it. Review managed identities, secrets, network rules, authentication, authorization, image provenance, dependency versions, and WAF or ingress policies. A canary should not bypass security just because it serves fewer users. Be careful with feature flags that expose restricted functionality to unintended audiences. Logs from the canary may contain production data, so retention and access still matter. Rollback rights should be controlled, and deployment records should show who approved each traffic increase. Review exceptions after each change. Review exceptions after each change.

Cost

Cost impact comes from running more than one version during rollout and from telemetry needed to evaluate the canary. Container revisions, pods, slots, gateways, logs, and duplicated dependencies may increase temporary spend. However, preventing a major incident usually saves more than the extra release capacity costs. Teams should size canary infrastructure carefully and remove inactive versions after the safe retention window. Cost reviews should include compute, ingress, telemetry, storage, test traffic, and engineering time. A canary should not become a permanent duplicate environment unless the business intentionally accepts that operating model. Review outcomes after each billing cycle. Review outcomes after each billing cycle.

Reliability

Reliability is the core reason for a canary. Define health gates before rollout: error rate, latency, saturation, dependency failures, queue backlog, conversion signals, and customer complaints. Start with low traffic, observe long enough to catch meaningful failures, then increase in planned steps. Make rollback a practiced action, not an improvised portal change. Canary reliability also depends on version compatibility with databases, queues, caches, and contracts shared with the old version. If both versions cannot run safely together, traffic splitting can create inconsistent behavior. A good canary protects users while producing trustworthy release evidence. Test the recovery path regularly. Test the recovery path regularly.

Performance

Performance validation is one of the best uses of canary deployment. The new version should be compared with the stable version for latency percentiles, CPU, memory, dependency timing, request rate, errors, and saturation. Small traffic samples can miss rare issues, so choose sample size and duration based on risk. Watch for cache warming effects, cold starts, autoscale behavior, and database query changes. If the canary is faster but creates more backend load, promotion may still be unsafe. Good performance gates compare user experience and resource efficiency before widening traffic. Document baseline measurements before tuning. Document baseline measurements before tuning. Document baseline measurements before tuning.

Operations

Operationally, a canary deployment needs a release owner, traffic plan, health dashboard, communication channel, and rollback command. Document the starting percentage, promotion steps, monitoring duration, success criteria, and stop conditions. During rollout, capture deployment IDs, revision names, slot names, route rules, metric snapshots, and incidents. Operators should know whether traffic is percentage-based, header-based, label-based, or feature-flag driven. After rollout, clean up old revisions or slots according to retention policy. Good operations make canary decisions boring and evidence-driven rather than based on optimism or pressure to finish quickly. Keep owners and evidence current. Keep owners and evidence current. Keep owners and evidence current.

Common mistakes

Promoting traffic without predefined success metrics and rollback criteria.
Testing only application errors while ignoring latency, dependencies, and saturation.
Leaving old canary resources active indefinitely after rollout completes.

Operator quick checks

Confirm current traffic split, version labels, health probes, and rollback target.
Compare canary and stable metrics for errors, latency, dependencies, and saturation.
Verify release owner, stop conditions, and communication channel before increasing traffic.

Questions to ask

What user percentage or segment is currently receiving the canary version?
Which metric gate must pass before the next traffic increase?
What exact command or action moves all traffic back to stable?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph