Compute Virtual Machines field-manual-complete field-manual operator-field-manual

VM extension

A VM extension is an add-on that Azure can push into a virtual machine after the machine exists. Instead of signing in and manually installing an agent, script, monitoring tool, or security component, you attach an extension and let the VM agent run it. Extensions are useful, but they are not harmless checkboxes. They execute inside the guest, often with high privilege, so settings, protected settings, version choices, and failure handling need the same care as any other automation.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure VM extension, virtual machine extension, VM agent extension, extension handler, custom script extension
Difficulty: fundamentals
CLI mappings: 5
Last verified: 2026-05-28

Microsoft Learn

A VM extension is a small software package that Azure installs through the VM agent to perform post-deployment configuration or operational tasks such as scripts, monitoring, security tooling, diagnostics, or domain integration. It is managed as a VM child resource.

Microsoft Learn: Azure virtual machine extensions and features2026-05-28

Technical context

In architecture, VM extensions sit between the Azure control plane and the guest operating system. The extension resource is configured through ARM, Bicep, CLI, policy, or the portal, while the VM agent downloads the handler and runs it inside the VM. Extensions commonly support custom scripts, monitoring agents, security agents, domain join, disk encryption, configuration management, and troubleshooting. They depend on agent health, outbound connectivity, supported operating systems, handler versions, and secret handling through protected settings.

Why it matters

VM extensions matter because they are how many Azure VMs become operational after the base image boots. A VM without the right monitoring, security, bootstrap, or access extension may look deployed but fail production readiness. At the same time, a broken extension can block deployments, loop on every update, expose secrets in logs, or create drift between environments. Understanding the term helps teams separate image responsibility from post-deployment configuration, decide which tasks belong in extensions versus custom images, and troubleshoot provisioning failures without blaming the hypervisor, network, or application first. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the VM Extensions and applications blade, each extension shows publisher, type, status, version, and recent provisioning messages for that specific machine and guest OS.

Signal 02

In ARM, Bicep, or Terraform, Microsoft.Compute/virtualMachines/extensions resources define handler settings, protected settings, publisher, type, and typeHandlerVersion values for repeatable fleet rollout governance.

Signal 03

In guest logs and Azure CLI output, extension provisioning states show whether a script, agent, or handler installed successfully, timed out, retried, or failed during setup.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Install monitoring or security agents consistently across VM fleets without manually logging in to each operating system.
Run a narrowly scoped custom script after deployment when a setting cannot be baked safely into the image.
Enforce domain join, diagnostics, or policy-required agents during VM provisioning for regulated environments.
Troubleshoot failed application readiness by separating VM power state from extension provisioning state and guest logs.
Remove or update a vulnerable extension handler version across many VMs using auditable automation.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Utility company restores missing monitoring after migration

Utility company restores missing monitoring after migration: VM extensions turn post-migration readiness into an auditable fleet operation instead of a manual checklist hidden inside each guest.

Scenario

An electric utility migrated 300 Windows VMs and discovered that a third of them were running without the required monitoring extension.

Business/Technical Objectives

Identify every VM missing the approved monitoring extension.
Install the extension without manual server login or broad administrative sessions.
Reduce unmonitored production minutes during maintenance windows.
Produce compliance evidence for the operations risk committee.

Solution Using VM extension

The infrastructure team used VM extension inventory as the control point. Azure CLI listed extensions by VM, filtered for the approved publisher and type, and exported gaps to a change plan. The monitoring extension was installed in waves using settings stored in source control and protected workspace identifiers. Each wave checked provisioningState, VM instance view, and guest extension logs. Machines with agent-health problems were separated into a repair queue instead of repeatedly rerunning the extension against broken guests.

Results & Business Impact

Monitoring coverage increased from 68 percent to 99 percent in two maintenance windows.
Mean time to identify failed extension installs dropped from hours to 12 minutes.
The risk committee received a VM-by-VM report with extension version and status.
Duplicate agent installations were removed from 27 servers, cutting noisy log ingestion by 14 percent.

Key Takeaway for Glossary Readers

VM extensions turn post-migration readiness into an auditable fleet operation instead of a manual checklist hidden inside each guest.

Case study 02

University research cluster replaces risky bootstrap scripts

University research cluster replaces risky bootstrap scripts: VM extensions are most useful when they standardize platform baseline work without turning every VM into a handcrafted server.

Scenario

A university lab used ad hoc SSH commands to prepare Linux VMs for grant-funded simulations, causing inconsistent agents and missing security configuration.

Business/Technical Objectives

Standardize post-deployment configuration across temporary research VMs.
Keep secrets out of researcher shell history and shared scripts.
Shorten new cluster preparation from a full day to under one hour.
Give central IT visibility into extension failures without taking over research code.

Solution Using VM extension

Central IT moved common configuration into VM extensions. A custom script extension installed approved packages and wrote nonsecret configuration, while monitoring and security extensions were deployed from policy for tagged research resource groups. Protected settings were supplied from controlled files during deployment, and workload-specific simulation code remained in the researchers’ own pipeline. CLI checks listed extension state after each cluster build and stored status results with the experiment metadata so failed bootstrap steps were visible before expensive runs started. The team also recorded ownership, approval history, rollback criteria, and verification evidence so later changes followed the same operating model.

Results & Business Impact

Cluster preparation time fell from eight hours to 46 minutes.
Failed setup steps were caught before simulations consumed paid compute.
No shared scripts contained access tokens after the protected-settings pattern was adopted.
Researchers kept control of experiment code while IT owned baseline extensions.

Key Takeaway for Glossary Readers

VM extensions are most useful when they standardize platform baseline work without turning every VM into a handcrafted server.

Case study 03

Insurance analytics team removes a failed security handler

Insurance analytics team removes a failed security handler: Extension troubleshooting is fastest when teams inspect handler state directly and separate guest-agent problems from application deployment errors.

Scenario

An insurance analytics platform saw VM deployments hang because an outdated security extension version failed on newer Linux images.

Business/Technical Objectives

Find every VM using the failing handler version.
Restore deployment success without disabling required endpoint protection.
Canary the replacement extension before touching production analytics nodes.
Document rollback steps for the security operations team.

Solution Using VM extension

The team treated VM extension state as the incident source of truth. CLI queries exported publisher, type, typeHandlerVersion, provisioningState, and substatus across affected resource groups. A test pool received the new handler version with auto-upgrade disabled until validation finished. Failed old handlers were removed only after the security team confirmed replacement telemetry in Defender. The pipeline then pinned the approved version and added a preflight check that blocked new images known to break the previous extension. The team also recorded ownership, approval history, rollback criteria, and verification evidence so later changes followed the same operating model.

Results & Business Impact

Deployment failure rate dropped from 29 percent to under 2 percent after the canary rollout.
Security telemetry returned for all production analytics VMs within one business day.
Rollback time was estimated at 15 minutes because delete and set commands were scripted.
Future image updates now run an extension compatibility gate before release.

Key Takeaway for Glossary Readers

Extension troubleshooting is fastest when teams inspect handler state directly and separate guest-agent problems from application deployment errors.

Why use Azure CLI for this?

I use Azure CLI for VM extensions because extension state is much easier to audit from structured output than from portal blades. CLI can list every extension, show publisher, type, handler version, provisioning state, settings shape, and timestamps, then export that evidence during incidents. It also supports repeatable installs, updates, and removals across fleets without relying on someone clicking the right VM. Experienced Azure engineers use CLI carefully here because an extension can run privileged code inside the guest; scripts should inspect first, target narrowly, and keep protected settings out of shell history. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

CLI use cases

List extensions on a VM and compare publisher, type, version, and provisioning state.
Install a tested extension with settings and protected settings from controlled files.
Show one extension during an incident to capture detailed status and substatus messages.
Delete a failed or deprecated extension after confirming the owning team no longer needs it.
Inventory extension versions across resource groups for security or compliance reporting.

Before you run CLI

Confirm tenant, subscription, resource group, VM name, OS support, VM agent health, and required extension publisher.
Review settings files for secrets, use protected settings where appropriate, and avoid pasting sensitive values into command history.
Check outbound connectivity, proxy rules, package sources, and extension documentation before assuming installation will succeed.
Understand whether the action changes guest configuration, restarts services, affects monitoring, or could break production readiness.
Use narrow targets and explicit output so bulk extension changes cannot sweep through unmanaged or critical machines.

What output tells you

Extension list output shows installed handlers, publishers, versions, and whether auto-upgrade or provisioning state differs from baseline.
Show output provides status and substatus text that usually points to script exit codes, download failures, or agent problems.
VM instance view can reveal whether the VM is healthy while a specific extension remains failed or transitioning.
Activity logs identify who changed an extension resource and whether ARM accepted the create, update, or delete request.
Guest extension logs explain details that Azure resource output cannot show, such as script stderr or package-manager errors.

Mapped Azure CLI commands

VM extension CLI operations

direct

az vm extension list --resource-group <resource-group> --vm-name <vm-name> --output table

az vm extensiondiscoverCompute

az vm extension show --resource-group <resource-group> --vm-name <vm-name> --name <extension-name>

az vm extensiondiscoverCompute

az vm extension set --resource-group <resource-group> --vm-name <vm-name> --publisher <publisher> --name <extension-type> --settings @settings.json --protected-settings @protected.json

az vm extensionconfigureCompute

az vm extension delete --resource-group <resource-group> --vm-name <vm-name> --name <extension-name>

az vm extensionremoveCompute

az vm get-instance-view --resource-group <resource-group> --name <vm-name>

az vmdiscoverCompute

Architecture context

Architecturally, a VM extension is a post-provisioning control that should be part of the VM baseline and release model. It is not a replacement for a hardened image, patch strategy, configuration management, or application deployment pipeline. I treat extensions as narrow, well-owned units: monitoring extension owned by platform operations, security extension owned by security engineering, and custom script extension owned by the workload team. Designs should define allowed publishers, supported versions, retry expectations, network requirements, and whether policy deploys or audits the extension across subscription scopes. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Security

Security impact is direct because extensions can install software, run scripts, change configuration, and handle secrets. Settings may be visible in the VM model, while protected settings are intended for sensitive values and still require careful handling. Limit who can add, update, or delete extensions; a malicious or careless extension can create local users, download untrusted code, weaken defenses, or leak tokens into logs. Use trusted publishers, pin versions when stability matters, store secrets in Key Vault-backed flows when possible, and review guest logs because extension handlers often run with elevated privileges inside the operating system. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Cost

Extensions rarely appear as the main billable object, but they influence cost through the services they enable and the operational effort they create. Monitoring and security extensions can increase Log Analytics ingestion, Defender coverage, storage, or third-party licensing. Broken custom scripts can leave VMs running idle while teams troubleshoot failed provisioning. Overusing extensions for heavy software installation increases build time and support burden compared with baking common components into images. Cost reviews should include extension-driven log volume, duplicate agents, failed deployment retries, and whether fleet-wide extension updates create avoidable labor. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Reliability

Reliability depends on the VM agent, extension handler, guest OS support, network reachability, version compatibility, and idempotent configuration. An extension can fail because the agent is unhealthy, the script exits nonzero, a download URL is blocked, protected settings are malformed, or another extension holds the same resource. Repeated extension failures can leave a VM half-configured even when Azure says the VM was created. Reliable designs use small scripts, staged rollouts, pinned or tested versions, clear timeout expectations, and monitoring that distinguishes VM power state from extension provisioning success. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Performance

Performance impact depends on what the extension does inside the guest. Lightweight monitoring agents are usually modest, while custom scripts can consume CPU, memory, disk I/O, or network bandwidth during installation. Extension startup can lengthen VM readiness time, especially when scripts download packages, join domains, or wait on external services. Multiple extensions running together can compete during first boot and make autoscaling slower. Measure boot-to-ready time, agent health, handler execution duration, and application performance after extension changes so automation does not silently become the bottleneck. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Operations

Operators inspect VM extensions when deployments stall, monitoring data disappears, security agents stop reporting, or a custom script behaves differently across environments. Common tasks include listing extension names and publishers, checking provisioning state and handler version, reading guest extension logs, removing failed handlers, rerunning configuration with a new sequence, and comparing settings against a golden baseline. Runbooks should document which extensions are mandatory, who owns each one, what outbound endpoints they need, and how to recover when the VM agent itself is unhealthy. It also gives platform, security, finance, and application teams a shared checklist for ownership, evidence, rollback, and production readiness review.

Common mistakes

Putting secrets in visible settings instead of protected settings or a stronger managed identity and Key Vault pattern.
Assuming VM creation succeeded means all required extensions installed and completed their guest-side work.
Using Custom Script Extension for long-running application deployment instead of a proper release or configuration pipeline.
Updating handler versions fleet-wide without canaries, rollback notes, or proof that the new version supports the OS.
Ignoring VM agent health and troubleshooting the extension package when the agent cannot communicate correctly.

Operator quick checks

List extensions and confirm mandatory monitoring, security, and diagnostics handlers exist with approved publishers.
Check provisioningState and substatus before declaring a deployment complete.
Review whether settings or protectedSettings contain the right values and no accidental plaintext secret.
Confirm the guest can reach required download endpoints, package repositories, or private artifact stores.
Read guest extension logs when Azure output is too generic to explain the failure.

Questions to ask

What exact guest change does this extension make, and who owns that behavior after deployment?
Which identity can install or remove extensions, and is that permission too broad?
What happens if this extension fails while the VM itself is running?
Where are settings, protected settings, and guest logs stored or exposed?
How will version changes be canaried, monitored, and rolled back?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph