Compute Virtual Machines field-manual-complete field-manual operator-field-manual

VM availability set

A VM availability set is a placement guardrail for a small group of related virtual machines. When you create two or more VMs in the same set, Azure spreads them across separate fault and update domains so one rack, power unit, or maintenance wave is less likely to take every instance down together. It is not the same as availability zones, and it does not create backups or replicas. It simply tells Azure these VMs belong to the same resilient application tier.

Back to glossary browser Open Microsoft Learn source

Aliases: availability set, Azure availability set, fault domain set, update domain set
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-28

Browse trail Learn Compute Virtual Machines VM availability set

Learning map Graph Compute concept cluster VM availability set

Context Concept cluster: Compute concept cluster

Microsoft Learn

A VM availability set is a logical grouping that helps Azure spread related virtual machines across fault domains and update domains. It reduces correlated hardware or maintenance impact for workloads that run multiple VM instances in one region or datacenter area.

Microsoft Learn: Availability sets overview2026-05-28

Technical context

In Azure architecture, an availability set is a regional Compute resource referenced when VMs are created. It defines platform fault domain and update domain distribution for those VMs. Availability sets are commonly used with managed disks, load balancers, application tiers, and older regional designs where availability zones are unavailable or not selected. The set is part of the control-plane placement model, while the application still needs data replication, health probes, patching strategy, and traffic distribution. Existing standalone VMs usually require redesign or recreation to join the placement model cleanly.

Why it matters

Availability sets matter because two VMs in the same region are not automatically protected from correlated infrastructure events. Without a placement signal, related instances can land in ways that expose both to the same maintenance domain or hardware failure. For many legacy or cost-sensitive workloads, an availability set is the simplest way to improve regional resilience without redesigning for zones. The limit is that it only protects the VM tier from certain platform events. It does not make the application stateless, replicate databases, heal bad deployments, or replace load-balancer health probes. Architects use it when the resilience target matches regional fault-domain protection.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the availability set portal blade, operators see fault domain count, update domain count, managed disk alignment, resource location, and the VMs assigned to the set.

Signal 02

In ARM, Bicep, or CLI deployment output, a VM references the availability set resource ID through the availabilitySet property during creation, review, and pipeline validation.

Signal 03

In reliability reviews, diagrams show multiple VM instances behind a load balancer with placement separated across update domains and fault domains for planned maintenance across instances.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Place two legacy web servers in separate fault and update domains when the application cannot yet move to availability zones.
Document regional resilience for a small VM-based application tier behind Azure Load Balancer or Application Gateway.
Avoid simultaneous maintenance impact across active and standby application servers that share the same regional deployment.
Validate migration templates so rebuilt VMs join the intended availability set before production cutover.
Compare availability sets with zones or scale sets when modernizing an older VM architecture.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Legal e-discovery portal survives host maintenance

Legal e-discovery portal survives host maintenance: A VM availability set helps legacy tiers gain practical regional resilience when the application can run on more than one server.

Scenario

A litigation support firm ran a two-node document review portal for active court deadlines. Both VMs were rebuilt during a migration, but the team needed regional resilience without changing the legacy application stack.

Business/Technical Objectives

Keep at least one review server available during planned maintenance.
Avoid a full application redesign before an active trial calendar.
Prove VM placement and load-balancer behavior to the client security team.
Reduce emergency after-hours support caused by host restarts.

Solution Using VM availability set

The migration team created a VM availability set with two fault domains and five update domains before rebuilding the application servers. Both VMs used managed disks, joined the same backend pool behind Azure Load Balancer, and exposed a health endpoint that checked the web process and document cache connection. ARM templates forced new servers to reference the availability set ID. Operators used Azure CLI to show the set, list VM membership, and capture instance-view evidence after test restarts. Session state moved to a small managed cache so either server could handle user traffic.

Results & Business Impact

Planned host maintenance no longer took the review portal fully offline.
Failover testing kept search and document viewing available with one VM stopped.
Client evidence collection dropped from two days to 30 minutes.
After-hours restart incidents fell 46 percent during the next quarter.

Key Takeaway for Glossary Readers

A VM availability set helps legacy tiers gain practical regional resilience when the application can run on more than one server.

Case study 02

Factory control dashboard gains paired collectors

Factory control dashboard gains paired collectors: Availability sets are useful when industrial systems need better VM placement without a larger networking redesign.

Scenario

A manufacturing plant monitored equipment through two Windows collector VMs that fed a local dashboard. Earlier, a single host issue stopped data collection and hid production-line alarms.

Business/Technical Objectives

Separate collector VMs across fault domains.
Keep dashboard feeds active during Azure maintenance events.
Avoid changing the plant network integration before a scheduled shutdown.
Give operations a simple placement check before each release.

Solution Using VM availability set

The industrial IT team deployed two collector VMs into a VM availability set and configured each to poll a different subset of line gateways with failover coverage. A load-balanced internal endpoint served dashboard requests, while data was written to Azure SQL instead of local files. CLI checks verified the availability set domain counts and confirmed both collectors referenced the same set. Patch windows were staggered by update domain, and health probes tested collector freshness rather than only Windows service status. The design stayed regional because the plant network link was not ready for zone-aware routing.

Results & Business Impact

Dashboard blind spots during maintenance were eliminated in the first two patch cycles.
Collector recovery time fell from 52 minutes to under 9 minutes.
Missed equipment alarms dropped by 71 percent after dual collection went live.
Release checks caught one accidentally standalone VM before production use.

Key Takeaway for Glossary Readers

Availability sets are useful when industrial systems need better VM placement without a larger networking redesign.

Case study 03

Municipal permitting app modernizes one tier at a time

Municipal permitting app modernizes one tier at a time: A VM availability set can be a sensible first step when a public-sector workload must improve resilience gradually.

Scenario

A city government hosted an older permitting application on two application VMs and a separate database. Availability zones were not part of the original network design, but permit intake had to stay online during maintenance.

Business/Technical Objectives

Improve application-tier availability before the database modernization phase.
Minimize changes to firewall rules and citizen-facing URLs.
Create auditable evidence for the city resilience plan.
Test failure behavior without disrupting business-hour submissions.

Solution Using VM availability set

The cloud team placed the two application VMs in a VM availability set and rebuilt the deployment template so every replacement server inherited the same placement. Application Gateway sent traffic only to VMs passing a real permit-submission health check. Static files moved to shared storage, and session affinity was reduced so users could continue on either instance. Operators used az vm availability-set show and az vm list queries in the monthly control review. A weekend test stopped each VM in turn while clerks submitted sample permits and monitored response time.

Results & Business Impact

Permit submission stayed available during both single-VM stop tests.
Monthly resilience evidence preparation dropped from 12 screenshots to one CLI export.
Citizen-facing error reports during patch weekends fell 39 percent.
The team identified database failover as the next real availability bottleneck.

Key Takeaway for Glossary Readers

A VM availability set can be a sensible first step when a public-sector workload must improve resilience gradually.

Why use Azure CLI for this?

I use Azure CLI for availability sets because placement mistakes are easiest to catch before VM creation. The portal makes the concept visible, but CLI lets me list all sets, show fault and update domain counts, and verify which VMs are actually attached. In mature environments, I also need repeatable evidence for resilience reviews: two web VMs in a set, managed disks aligned, load balancer configured, and no orphan standalone instance pretending to be redundant. CLI is also useful during migrations because scripts can compare planned templates with actual resources and catch VMs created outside the approved availability pattern. Scripts catch that mismatch before outages.

CLI use cases

List availability sets in a resource group and identify workloads that still depend on regional fault-domain placement.
Show platform fault domain and update domain counts before approving a production VM build.
Create an availability set from infrastructure automation before deploying the first VM in an application tier.
Query VM membership to find related VMs created outside the intended set during migrations.
Collect instance view and activity log evidence when maintenance or hardware events affect only part of a tier.

Before you run CLI

Confirm tenant, subscription, resource group, region, availability set name, and whether the target VM must be created or rebuilt.
Check whether the workload should use availability zones or a scale set instead of a classic availability set pattern.
Verify VM size availability, managed disk usage, load-balancer design, health probes, and application state externalization before claiming resilience.
Understand that creating the set is harmless, but attaching VMs requires correct placement at VM creation time.
Capture current VM membership, templates, and dependency diagrams before rebuilding standalone VMs into a set.

What output tells you

Availability-set output shows location, SKU style, platform fault domain count, and platform update domain count for placement planning.
VM list output reveals which machines reference the set ID and which related machines were accidentally created outside it.
Instance view output helps connect platform maintenance, restart states, and domain distribution during incident review.
Template output shows whether the availabilitySet property is present before a VM deployment reaches production.
Activity logs identify who created, changed, or deleted the set and when resilience-related drift began.

Mapped Azure CLI commands

VM availability set operations

direct

az vm availability-set list --resource-group <resource-group> --output table

az vm availability-setdiscoverCompute

az vm availability-set show --resource-group <resource-group> --name <availability-set-name>

az vm availability-setdiscoverCompute

az vm availability-set create --resource-group <resource-group> --name <availability-set-name> --platform-fault-domain-count 2 --platform-update-domain-count 5

az vm availability-setprovisionCompute

az vm list --resource-group <resource-group> --query "[?availabilitySet.id!=null].{name:name,availabilitySet:availabilitySet.id}"

az vmdiscoverCompute

az vm get-instance-view --resource-group <resource-group> --name <vm-name>

az vmdiscoverCompute

Architecture context

An availability set belongs in the compute placement layer of a regional architecture. I use it when an application tier has multiple VM instances that can serve the same role and needs protection from host or maintenance correlation, but the design is not using availability zones or scale sets. The architecture should include a load balancer or application gateway, health probes that test real readiness, managed disks, patch coordination, and application-level replication where state exists. The availability set is chosen at VM creation, so it must be part of the initial template. Retrofitting it after servers exist usually means planned rebuild, migration, or moving toward a scale-set pattern.

Security

Security impact is mostly indirect because an availability set controls placement, not access. It does not open ports, grant identities, encrypt disks, or isolate tenants beyond Azure’s platform controls. Risk appears when teams mistake availability for security and skip NSGs, disk encryption, patching, or privileged access controls. Each VM in the set still needs secure management through Bastion or private access, least-privilege identities, hardened images, and monitored admin operations. Availability sets can also create a false sense of safety if every VM shares the same local admin secret, vulnerable image, or exposed public endpoint. Hardening and privileged-access review still matter.

Cost

The availability set itself has no separate charge, but the resilient design around it costs money. You pay for the multiple VMs, disks, load balancer or application gateway, monitoring, backup, patching effort, and any replicated data services. The value is avoiding downtime from correlated platform events without paying for a more complex zonal or cross-region design. Cost mistakes include creating two oversized VMs when one active and one smaller standby would meet the need, or paying for duplicate servers without health probes that can actually shift traffic. FinOps should compare uptime requirements against zones, scale sets, and PaaS alternatives. Idle standby capacity needs ownership.

Reliability

Reliability is the main purpose. Availability sets reduce the chance that planned maintenance or a localized hardware fault affects every related VM at the same time. Update domains spread maintenance impact; fault domains spread hardware failure risk. Reliability still depends on having at least two VMs, externalized state, load-balanced traffic, healthy probes, patch coordination, and tested failover. A single VM in an availability set gains little. Stateful workloads need replication beyond the set. For higher resilience, compare availability sets with availability zones, scale sets, managed database replicas, and application-level active-active patterns. Tested failover proves the target during maintenance. Document failover drills and owner actions.

Performance

Availability sets do not directly increase CPU, memory, disk throughput, or network speed. Their performance impact comes from distribution and traffic routing. If the application is load-balanced across healthy VMs, users can keep service when one domain is unavailable. If health probes are shallow, traffic may keep flowing to a broken instance. Stateful or chatty applications can perform worse if session state is local or file shares are not externalized. Operators should test per-instance latency, load-balancer probe behavior, patch sequencing, and recovery time after one VM is stopped. Performance confidence comes from failover tests, not from the set existing. Planned maintenance should be tested under load.

Operations

Operators inspect availability sets to confirm VM membership, domain counts, provisioning state, load-balancer association, patch sequence, and whether managed disks align with fault domains. Operational jobs include listing sets before maintenance, checking that newly created VMs joined the intended set, reviewing instance view after platform events, and documenting why a workload uses regional placement instead of zones. During incidents, operators should determine whether one domain failed, whether probes removed unhealthy instances, and whether application state prevented traffic failover. Change management should treat removing or rebuilding a set member as a resilience event. Audits and incidents need that clarity. Keep rebuild steps documented.

Common mistakes

Creating only one VM in an availability set and claiming the application is highly available.
Forgetting that existing VMs cannot casually be placed into an availability set after creation without rebuild planning.
Using an availability set when the business requirement actually needs zonal or regional disaster recovery.
Skipping load-balancer health probes, so traffic still reaches a failed VM in a different domain.
Replicating neither data nor sessions and then discovering the second VM cannot serve users independently.

Operator quick checks

Are at least two VMs in the set, and do they perform the same application role?
Does a load balancer or application gateway route traffic only to healthy set members?
Is application state externalized so one VM can fail without losing sessions or writes?
Do templates create the availability set before the VM and reference the correct resource ID?
Has the team compared this design with availability zones, scale sets, or a managed PaaS option?

Questions to ask

What failure mode is this availability set meant to survive, and is that enough for the SLA?
Who owns rebuilding VMs if a server was created outside the intended set?
How will traffic drain from a VM during planned maintenance or patching?
Which data, sessions, secrets, and certificates must be shared or replicated across members?
What test proves the application remains usable when one VM is stopped or restarted?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph