Compute Azure Virtual Machines field-manual-complete field-manual operator-field-manual

Virtual machine

A virtual machine is a server you rent from Azure instead of buying hardware. You choose the size, operating system image, disks, network, region, and access method, then run your software on it. Azure operates the physical host and fabric, but you still own the guest operating system, patching approach, installed agents, backups, and workload configuration. VMs are useful when an application needs full OS control, legacy software support, custom drivers, or a migration path that looks close to an existing server.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure VM, VM, cloud virtual server, IaaS virtual machine, virtual machine
Difficulty: beginner
CLI mappings: 5
Last verified: 2026-05-28T00:00:00Z

Microsoft Learn

An Azure virtual machine is an on-demand compute resource that runs a guest operating system and workload in Azure. It combines a VM size, image, disks, network interfaces, identity, extensions, and availability settings so teams can operate server-based applications without owning physical hardware.

Microsoft Learn: Overview of virtual machines in Azure2026-05-28T00:00:00Z

Technical context

Azure virtual machines are infrastructure-as-a-service compute resources. They sit in a subscription and resource group, use a region and availability design, attach managed disks, connect through network interfaces, and depend on virtual networks, subnets, NSGs, public or private IPs, identities, extensions, and monitoring agents. A VM can be created from platform images, shared gallery images, or custom images. It belongs to the control plane for provisioning and lifecycle operations, while the operating system and workload behavior live inside the guest data path.

Why it matters

Virtual machines matter because many production workloads still need server-level control. They support lift-and-shift migrations, domain-joined services, specialized agents, database engines, vendor appliances, build workers, GPU jobs, and software that is not ready for platform services or containers. They also carry responsibility. A VM left unpatched, exposed over RDP or SSH, oversized, or missing backup can create security, cost, and reliability problems fast. Understanding VMs helps teams decide when IaaS is the right fit, when a managed service would reduce toil, and which operational controls must exist before the server becomes a production dependency. That judgment prevents cloud migrations from becoming unmanaged server sprawl with better billing. That judgment prevents avoidable toil after migration.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal Virtual machines blade, the VM shows power state, size, operating system, public IP, private IP, resource group, and subscription. during release reviews and troubleshooting.

Signal 02

In az vm get-instance-view output, operators see provisioning state, power state, extension status, patch status, boot diagnostics signals, and recent platform events. during release reviews and troubleshooting.

Signal 03

In ARM or Bicep templates, a VM appears with hardwareProfile, storageProfile, osProfile, networkProfile, identity, diagnosticsProfile, and extension resources. during release reviews and troubleshooting. during production validation and incident triage.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Migrate a legacy Windows or Linux server when the application needs full operating system control before modernization.
Run vendor software, agents, or appliances that require kernel, driver, licensing, or domain-join behavior not available in PaaS.
Host GPU, HPC, or specialized compute jobs where VM size family and disk layout are workload-critical decisions.
Create temporary build, test, or troubleshooting machines with scripted creation and scheduled deallocation to control spend.
Isolate a sensitive workload in a private subnet while controlling OS hardening, patching, backup, and access paths.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Animation studio bursts render capacity

Animation studio bursts render capacity: Virtual machines are practical when a workload needs server-level control, fast capacity, and strict shutdown discipline.

Scenario

An animation studio needed short-term Linux render workers for a streaming release. Buying hardware would miss the deadline, but artists needed predictable output before final review.

Business/Technical Objectives

Provision render servers within one day instead of waiting for hardware.
Keep project files on private networking with restricted operator access.
Measure CPU, disk, and job throughput before committing to a larger run.
Shut down idle capacity immediately after delivery.

Solution Using Virtual machine

The studio created a pool of Azure virtual machines from a hardened gallery image that included the render engine, monitoring agent, and required fonts. VMs were placed in a private subnet with NSG rules allowing only Bastion and render-controller traffic. Managed disks used a tier matched to frame-cache I/O, and each VM received tags for project, owner, and shutdown date. Operators used Azure CLI to create, list, and deallocate machines, then captured metrics for CPU utilization, disk latency, and failed jobs. The runbook required deallocation after each overnight batch and deletion after final acceptance, preventing abandoned render nodes from surviving the project.

Results & Business Impact

Initial render capacity was available in 9 hours instead of the six-week hardware lead time.
Frame throughput improved 46 percent after disk caching and VM size were tuned.
Idle compute cost fell 58 percent compared with leaving workers running between batches.
No public SSH exposure was found in the security review because access used Bastion and private IPs.

Key Takeaway for Glossary Readers

Virtual machines are practical when a workload needs server-level control, fast capacity, and strict shutdown discipline.

Case study 02

Insurance platform keeps a legacy rating engine alive

Insurance platform keeps a legacy rating engine alive: A VM can be the right modernization step when the immediate business need is safe migration, not instant platform redesign.

Scenario

An insurer had a licensed rating engine that required a Windows service, fixed registry settings, and vendor support tooling. Rewriting it before a regulatory deadline was not realistic.

Business/Technical Objectives

Move the rating engine out of an aging datacenter before lease expiration.
Preserve vendor-supported Windows configuration and domain integration.
Add backup, monitoring, and controlled administrator access.
Document a future modernization path without disrupting policy issuance.

Solution Using Virtual machine

The infrastructure team migrated the rating engine to an Azure virtual machine in a private application subnet. The VM used managed disks, a system-assigned identity for Key Vault access, Azure Backup, Update Manager, and monitoring alerts for CPU, service status, disk latency, and heartbeat loss. Administrators connected through Azure Bastion, and NSG rules allowed only application-tier traffic to the rating service port. CLI scripts exported the VM configuration, extension state, NIC, disk, and tag details for change records. A recovery test restored the VM to an isolated subnet and replayed sample rating requests before the datacenter cutover.

Results & Business Impact

Datacenter exit finished 19 days before the lease deadline.
Rating-service availability held at 99.94 percent during the first quarter in Azure.
Restore testing reduced expected recovery time from 14 hours to 3.5 hours.
Public administrative exposure dropped to zero after Bastion replaced direct RDP.

Key Takeaway for Glossary Readers

A VM can be the right modernization step when the immediate business need is safe migration, not instant platform redesign.

Case study 03

Research lab controls GPU experimentation costs

Research lab controls GPU experimentation costs: Virtual machines give researchers flexibility, but tagging, identity, networking, and deallocation rules make that flexibility affordable and safe.

Scenario

A university research lab needed GPU capacity for simulation and model experiments. Previous cloud tests left expensive machines running after students finished jobs.

Business/Technical Objectives

Provide approved GPU VMs for short research windows.
Prevent accidental public access to experiment machines.
Track spend by grant and project.
Reduce wasted compute hours from forgotten instances.

Solution Using Virtual machine

The lab published a small set of approved Azure virtual machine sizes and images for research teams. Each VM was created in a private subnet, tagged with grant, principal investigator, student, and expiration date, and configured with monitoring plus auto-shutdown policy. A managed identity allowed access to approved storage containers without embedding keys. Operators used CLI to list running VMs by tag, deallocate idle machines, and export monthly utilization evidence for grant administrators. Students received documented commands for starting experiments and checking status, while deletion required owner approval so data disks were not lost accidentally.

Results & Business Impact

Forgotten running GPU hours dropped 72 percent in the first billing cycle.
Grant chargeback reports were produced in 30 minutes instead of two days of manual spreadsheet work.
Security scans found no public IP exposure on approved research VM templates.
Average experiment setup time fell from 1.5 days to under 2 hours.

Key Takeaway for Glossary Readers

Virtual machines give researchers flexibility, but tagging, identity, networking, and deallocation rules make that flexibility affordable and safe.

Why use Azure CLI for this?

I use Azure CLI for virtual machines because server work demands repeatable evidence. The portal is fine for one VM, but real operations involve showing power state, resizing, starting, stopping, deallocating, checking extensions, exporting inventory, comparing images, and scripting changes across many resource groups. After ten years in Azure, I do not trust screenshots for incident response or change review. CLI output captures IDs, zones, sizes, NICs, disk names, identities, tags, and provisioning states in a form that can be logged, diffed, and automated. It also lowers mistakes when a change touches production and nonproduction VMs together. It also makes emergency evidence collection much faster than copying portal screenshots. That consistency matters when emergency actions span many servers.

CLI use cases

Inventory VM sizes, zones, power states, identities, NICs, disks, tags, and owners across subscriptions before maintenance.
Start, stop, deallocate, restart, or resize a VM from an approved runbook with logged command output.
Inspect extension status, boot diagnostics, serial console settings, and instance view during an outage.
Attach monitoring, backup, managed identity, or tags consistently after a migration wave.
Export evidence for rightsizing, reservation planning, vulnerability cleanup, or access reviews.

Before you run CLI

Confirm tenant, subscription, resource group, VM name, region, zone, owner tags, and whether the command affects production.
Check permissions, locks, backup status, disk dependencies, public IP exposure, and whether restart or deallocation will interrupt users.
Review quota, target size availability, licensing, reservation impact, and application maintenance windows before resizing or moving workloads.
Use JSON output for evidence, and double-check destructive commands such as delete, disk detach, redeploy, or extension removal.

What output tells you

VM show output identifies the resource ID, location, hardware profile, storage profile, OS type, network interfaces, identity, and tags.
Instance view output shows power state, provisioning state, boot diagnostics, extension status, and platform-level health signals useful during incidents.
Network and disk fields show which subnet, NIC, public IP, OS disk, and data disks are attached to the workload.
Activity and metrics output reveal who changed the VM, when it restarted, and whether CPU, disk, or network pressure matches the complaint.

Mapped Azure CLI commands

Virtual machine Azure CLI operations

direct

az vm show --name <vm-name> --resource-group <resource-group> --show-details

az vmdiscoverCompute

az vm get-instance-view --name <vm-name> --resource-group <resource-group>

az vmdiscoverCompute

az vm deallocate --name <vm-name> --resource-group <resource-group>

az vmoperateCompute

az vm resize --name <vm-name> --resource-group <resource-group> --size <vm-size>

az vmoperateCompute

az monitor metrics list --resource <vm-resource-id> --metric Percentage CPU,Disk Read Bytes,Disk Write Bytes

az monitor metricsdiscoverCompute

Architecture context

Architecturally, a VM is the compute boundary for a workload that needs an operating system. It should appear in diagrams with its subnet, NSG, route table, disks, identity, backup policy, patching method, monitoring agent, and access path such as Bastion or private jump host. Good VM architecture avoids treating the server as a lonely box. It designs availability zones or sets, recovery, image management, extension governance, secrets access, logging, and cost controls from the start. VMs are often the bridge between legacy datacenter patterns and cloud-native services, so the architecture should show what is temporary, strategic, and automated. Without that context, the VM becomes a mystery box during incidents. This keeps ownership clear during incidents and audits.

Security

Security for a virtual machine covers the Azure resource and the guest operating system. Resource roles control who can start, stop, resize, delete, or run commands. Network rules decide whether management ports are private, bastion-only, or dangerously exposed. Managed identity should replace stored credentials when the VM calls Azure services. Disk encryption, secure boot options, patching, endpoint protection, vulnerability scanning, and logging all matter. The biggest mistake is treating a cloud VM like a server hidden in a trusted rack; public IPs, weak local accounts, stale extensions, and broad contributor roles create a large attack surface. Security review should confirm the exact permission boundary before any production configuration or access path changes.

Cost

Virtual machine cost is driven by size, running hours, disk tier, snapshots, backup, bandwidth, licensing, monitoring, and support effort. Stopped but allocated VMs can still charge for compute, while deallocated VMs stop compute charges but keep disks and related resources. Oversized VMs, premium disks on low-I/O workloads, forgotten public IPs, and unmanaged dev/test schedules are common waste. Reserved instances, savings plans, Spot VMs, auto-shutdown, right-sizing, and tagging can reduce spend. FinOps reviews should include utilization, power state, disk attachment, backup retention, and whether the workload still needs a VM at all. Cost reviews should connect the setting to workload demand, ownership, and cleanup responsibilities.

Reliability

Reliability for a virtual machine depends on placement, recovery design, and operational discipline. A single VM can be a single point of failure unless the workload tolerates downtime or is paired with availability zones, scale sets, load balancing, clustering, backup, or replication. Managed disks are durable, but the guest OS, application, and data consistency still need recovery planning. Operators should monitor heartbeat, disk space, CPU, memory, extension failures, and backup status. Maintenance events, patch restarts, capacity issues, and accidental deallocation must be planned so one server action does not become a business outage. Teams should validate failure behavior before the dependency becomes part of a critical user path.

Performance

Virtual machine performance is shaped by VM size, CPU generation, memory, disk IOPS, network throughput, accelerated networking, guest tuning, and noisy application behavior. Resize decisions should be based on metrics, not guesses. A faster CPU will not fix slow storage, and premium disks will not fix single-threaded code. Operators should watch p95 CPU, available memory, disk queue depth, data disk latency, network drops, and application counters. Boot diagnostics and run-command output help diagnose startup issues. For steady fleets, performance baselines reveal whether scaling up, scaling out, caching, or moving to a managed service is the better answer. Baseline tests should be repeated after changes so latency or throughput regressions are caught early.

Operations

Operators inspect virtual machines through instance view, activity logs, boot diagnostics, Azure Monitor metrics, guest logs, update status, backup jobs, and extension state. Common work includes resizing, deallocating, patching, attaching disks, rotating credentials, reviewing network exposure, and documenting ownership. Automation should tag VMs by environment, cost center, backup criticality, and shutdown policy. Troubleshooting often requires both Azure control-plane checks and guest OS investigation. Good runbooks explain how to access the VM safely, what changes require approval, and how to recover if an extension, patch, disk, or network change fails. The strongest runbooks name the owner, the expected state, and the command evidence required after each change.

Common mistakes

Leaving SSH or RDP open to the internet because it was convenient during migration testing.
Stopping the OS but not deallocating the VM, then wondering why compute charges continue.
Resizing or rebooting a stateful production VM without confirming application failover or maintenance windows.
Assuming Azure backs up every VM automatically without assigning and testing a backup policy.
Embedding secrets in custom data, scripts, or images instead of using managed identity and Key Vault.

Operator quick checks

Is this VM protected by backup, monitoring, patching, endpoint security, and documented owner tags?
Does the VM have any public IP, open NSG rule, or unmanaged access path that should be closed?
Are CPU, memory, disk, and network metrics aligned with the selected VM size and disk tier?
Can the team restore the VM or rebuild it from image and automation if the OS fails?
Is deallocation scheduled for nonproduction VMs that do not need to run continuously?

Questions to ask

What workload runs here, and why is a VM better than a managed service or container platform?
Who owns guest OS patching, application updates, backup validation, and emergency access?
What fails if this VM is rebooted, resized, redeployed, or isolated from the network?
Which metrics prove the VM is correctly sized and healthy enough for production?
What is the recovery path if the VM, disk, zone, or image becomes unusable?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learning paths

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph