Compute Virtual Machines verified field-manual operator-field-manual

VM restart operation

A VM restart operation is a controlled reboot of an Azure virtual machine. It is similar to restarting a physical server, except the command comes through Azure management APIs or the portal. The VM resource stays the same, and persistent disks remain attached. Restart is useful after patches, configuration changes, agent fixes, or application recovery steps that require a clean boot. It is still downtime for anything depending on that VM, so operators should not treat it as harmless just because it is common.

Back to glossary browser Open Microsoft Learn source

Aliases: VM restart operation, vm restart operation
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-29

Microsoft Learn

A VM restart operation reboots an Azure virtual machine through the Azure control plane. It stops and starts the guest operating system on the same VM resource, preserving disks, network interfaces, identity settings, and configuration while causing a planned service interruption.

Microsoft Learn: Virtual Machines - Restart - REST API2026-05-29

Technical context

Technically, restart is a Microsoft.Compute action on an existing virtualMachines resource. It affects the guest runtime and power state but does not replace the VM, change the VM size, modify the image reference, or detach managed disks and NICs. Azure exposes restart through portal, CLI, PowerShell, SDK, and REST operations. It is related to start, stop, deallocate, redeploy, and run command, but the semantics are different: restart keeps the same allocation path where possible and focuses on rebooting the guest OS.

Why it matters

VM restart operation matters because rebooting is simple until it happens to the wrong workload at the wrong time. Restart can apply patches, clear hung services, refresh agents, and recover from transient guest failures. It can also drop active sessions, break long-running jobs, trigger cluster failover, or reveal that an application cannot start cleanly. Operators need to know whether restart is safe, who owns the application, whether traffic can be drained, and what health checks prove success. The operation is basic, but it is one of the most common points where poor runbooks create avoidable outages. Record owner approval and post-reboot health evidence in the ticket.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal VM Overview blade, Restart appears beside start, stop, connect, redeploy, and run command as a disruptive lifecycle action for operators for production review.

Signal 02

In Azure CLI and Activity Log, az vm restart records the target resource, caller, timestamp, operation status, and resulting VM state transitions during maintenance for production review.

Signal 03

In monitoring dashboards, restart shows as heartbeat gaps, boot events, service restarts, connection resets, and application health probe failures during the reboot window for production review.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Apply guest OS patches or application prerequisites that require a reboot after Update Manager or manual maintenance.
Recover a hung service when run command or guest access confirms the OS needs a clean restart.
Restart VMs in a defined order to maintain cluster quorum, dependency sequencing, or rolling availability.
Refresh VM agent or extension behavior after configuration repair without moving the VM to another host.
Produce audit evidence that a scheduled maintenance reboot occurred within the approved change window.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Insurance analytics cluster applies monthly patches safely

Insurance analytics cluster applies monthly patches safely: Restart is routine only when sequencing, health checks, and evidence make it safe for the workload.

Scenario

An insurance analytics team ran a three-node calculation cluster on Azure VMs, and prior patch nights caused user-visible failures because all nodes were rebooted together.

Business/Technical Objectives

Apply monthly security patches without breaking cluster quorum.
Keep at least two calculation nodes available during business validation.
Reduce manual reboot tracking errors during maintenance.
Capture evidence for the internal risk committee.

Solution Using VM restart operation

The platform owner redesigned the maintenance runbook around VM restart operation. Nodes were tagged with restart rings and cluster roles, and Azure CLI queried each VM for owner, power state, and patch status before action. The team drained one node from the calculation scheduler, ran az vm restart, waited for instance view to return running, and then used a synthetic actuarial job to verify service readiness. Only after the node rejoined the cluster did the next ring begin. Activity-log entries, command output, and scheduler health screenshots were stored in the change record. The team also added a rule that a failed post-restart health check stopped the remaining reboots automatically.

Results & Business Impact

Patch maintenance completed with zero user-visible calculation failures.
Cluster quorum remained healthy throughout all four maintenance cycles measured.
Manual tracking errors dropped from five per quarter to none after ring automation.
Audit evidence preparation fell from six hours to 45 minutes per patch cycle.

Key Takeaway for Glossary Readers

Restart is routine only when sequencing, health checks, and evidence make it safe for the workload.

Case study 02

Museum ticketing service recovers after certificate reload

Museum ticketing service recovers after certificate reload: A restart is often the cleanest way to apply guest-level changes when service reloads leave hidden state behind.

Scenario

A museum group rotated TLS certificates for a ticketing service, but the Windows VM continued serving the old certificate because several dependent services did not reload cleanly.

Business/Technical Objectives

Load the new certificate before an online exhibition sale.
Keep the outage under ten minutes outside visiting hours.
Validate the public ticketing endpoint after reboot.
Avoid manual registry edits on the production VM.

Solution Using VM restart operation

The operations engineer first tried service-level reloads, then confirmed the application still presented the old certificate through an external probe. They scheduled a short VM restart operation after closing time and warned the digital ticketing team. Before running az vm restart, the engineer captured the current power state, certificate thumbprint, and application health. After the restart, Azure CLI confirmed the VM was running, and synthetic browser tests verified the new certificate chain, login page, payment callback, and monitoring heartbeat. Because the VM had a reliable startup script and all data lived in managed storage, no extra repair steps were needed. The timeline and proof of certificate change went into the change ticket.

Results & Business Impact

The correct certificate was live seven minutes after the maintenance began.
The exhibition sale opened on time with no TLS warnings reported by customers.
Operations avoided risky manual fixes and reduced the change scope to one reboot.
The team added certificate validation to future restart completion checks.

Key Takeaway for Glossary Readers

A restart is often the cleanest way to apply guest-level changes when service reloads leave hidden state behind.

Case study 03

Logistics company fixes stuck agent without redeploying

Logistics company fixes stuck agent without redeploying: Restart can be the right middle path when the guest needs a clean boot but the Azure host and VM configuration are sound.

Scenario

A logistics company depended on a VM-hosted route optimizer, and a monitoring extension stopped sending heartbeat after a failed third-party agent update.

Business/Technical Objectives

Restore monitoring heartbeat before the overnight routing batch.
Avoid redeploying or resizing a VM that was otherwise healthy.
Confirm the route optimizer service returned automatically.
Collect enough evidence to improve the agent update process.

Solution Using VM restart operation

The cloud team compared VM metrics, extension status, guest event logs, and application queue length. The route optimizer was still processing jobs, so they chose a scheduled VM restart operation rather than redeploy or repair. Dispatchers were warned that new route submissions would queue for a few minutes. The engineer captured az vm get-instance-view output, restarted the VM, and watched boot diagnostics until the guest returned. After startup, they verified extension heartbeat, route optimizer service status, private database connectivity, and completion of a test route calculation. The failed agent update logs were preserved for vendor escalation, while the restart restored monitoring before the nightly batch began.

Results & Business Impact

Monitoring heartbeat returned in 11 minutes and stayed stable for the next 30 days.
The overnight routing batch completed 24 minutes earlier than the prior failed run.
No VM redeploy, disk repair, or application reinstall was needed.
The agent rollout process gained a precheck that detects pending reboot state.

Key Takeaway for Glossary Readers

Restart can be the right middle path when the guest needs a clean boot but the Azure host and VM configuration are sound.

Why use Azure CLI for this?

I use Azure CLI for restart because the safest reboot is the one you can prove and repeat. A seasoned Azure engineer captures the target VM, current power state, owner tag, health signal, and exact restart timestamp before touching production. CLI is faster than navigating many portal blades, and it works well in maintenance scripts that restart only approved machines. It can also wait, query status, and export evidence for the change ticket. The important discipline is not typing az vm restart quickly; it is proving the right VM, scope, and rollback path before issuing a disruptive action. Record owner approval and post-reboot health evidence in the ticket.

CLI use cases

Restart one named VM after confirming owner approval and maintenance timing.
Query power state before and after restart for a change ticket or incident timeline.
Restart a controlled ring of VMs by tag or resource group while preserving service capacity.
Combine restart with boot diagnostics checks when a VM fails to return healthy after patching.

Before you run CLI

Confirm subscription, resource group, VM name, owner approval, and whether the VM is part of a cluster or load-balanced pool.
Drain traffic, stop schedulers, or pause queues if active sessions or jobs could be lost.
Check backups, startup dependencies, and the expected time for the application to become healthy again.
Choose output format and logging so the restart command and follow-up state checks become part of the maintenance record.

What output tells you

A successful command response means Azure accepted the restart request, not necessarily that the application is healthy.
Instance view shows power state, provisioning status, VM agent readiness, and extension issues after the reboot.
Activity-log entries identify the caller, target resource, operation status, and time window for audit or incident analysis.
Boot diagnostics reveal whether the guest OS reached startup or became stuck before application checks could run.

Mapped Azure CLI commands

VM restart operations

direct

az vm get-instance-view --resource-group <resource-group> --name <vm-name> --output json

az vmdiscoverCompute

az vm restart --resource-group <resource-group> --name <vm-name>

az vmoperateCompute

az vm wait --resource-group <resource-group> --name <vm-name> --updated

az vmoperateCompute

az vm boot-diagnostics get-boot-log --resource-group <resource-group> --name <vm-name>

az vm boot-diagnosticsdiscoverCompute

az monitor activity-log list --resource-group <resource-group> --offset 2h --query "[?operationName.value=='Microsoft.Compute/virtualMachines/restart/action']"

az monitor activity-logdiscoverCompute

Architecture context

Architecturally, restart is a guest-runtime recovery and maintenance action. It should fit inside a broader resilience design: load-balanced instances, zone distribution, backup, monitoring, and application startup automation. If restarting one VM takes down a business service, the architecture has a dependency that should be documented and eventually improved. For clustered applications, restart order matters because quorum, session state, and database locks can be affected. For pets rather than cattle, restart may be the safest action available, but architects should still define maintenance sequencing, health probes, startup dependencies, and operator approvals. Record owner approval and post-reboot health evidence in the ticket.

Security

Security impact is mostly operational. Restart does not change access rights, encryption, or secrets, but it can apply pending security updates, restart protection agents, and reload configuration that was previously staged. It can also create risk when a server does not return, leaving monitoring gaps or emergency access pressure. Confirm who has permission to restart production VMs, and capture identity evidence in the Activity log. After restart, verify endpoint protection, vulnerability scanning, managed identity consumers, firewall services, and just-in-time access behavior. A reboot that disables a security agent should be treated as a failed change, not a success. Record owner approval and post-reboot health evidence in the ticket.

Cost

Restart has little direct billing effect because the VM generally returns to running state and compute billing continues. The cost impact comes from downtime, support effort, missed processing windows, and repeated manual maintenance. Restarting a batch server during a job can waste compute already spent. Restarting a customer-facing VM without drain can cause incidents that cost far more than the VM. On the positive side, a planned restart after patching can prevent expensive emergency recovery later. Cost-aware operations schedule restarts, batch them sensibly, and use automation to avoid after-hours toil. Record owner approval and post-reboot health evidence in the ticket.

Reliability

Reliability impact is direct because restart is planned downtime for the VM and a test of startup readiness. A healthy architecture absorbs it through multiple instances, failover, or queued work. A fragile architecture turns it into an outage. Before restarting, drain connections, check backups, confirm cluster roles, and know the service dependency order. Afterward, verify not only power state but application health, guest agent status, extensions, log ingestion, and client path connectivity. Repeated restarts are not reliability engineering; they are a signal to diagnose memory leaks, service hangs, patch problems, or capacity limits. Record owner approval and post-reboot health evidence in the ticket.

Performance

Performance impact is usually temporary and diagnostic. Restart clears process memory, reloads services, reapplies startup scripts, and may restore performance when a guest has leaked resources or a service is stuck. It does not increase CPU, memory, disk throughput, or network limits. If performance improves only after restart, investigate the underlying cause rather than accepting reboot as the fix. Measure startup time, application warmup, cache rebuild duration, and post-restart latency. For autoscaled or load-balanced systems, restart sequencing should preserve enough warm capacity so users do not experience cold-start delays. Record owner approval and post-reboot health evidence in the ticket. Record owner approval and post-reboot health evidence in the ticket.

Operations

Operators restart VMs after patching, configuration changes, troubleshooting, extension recovery, and application maintenance. A good procedure captures current state, warns owners, drains traffic, runs the restart, waits for the VM to report running, and checks the real workload. For fleets, operators should group restarts by update domain, zone, role, or maintenance ring instead of rebooting everything together. Tagging and change records matter because restart is easy to do and easy to forget. When a restart fails, operators inspect boot diagnostics, serial console, VM agent status, extensions, and recent configuration changes. Record owner approval and post-reboot health evidence in the ticket.

Common mistakes

Restarting the VM instead of the application service when a service-level restart would have been safer and faster.
Rebooting every node in a cluster together and losing quorum, cache warmth, or user sessions.
Assuming power state Running means the application, agent, log pipeline, and health probe are all healthy.
Forgetting to verify the active CLI subscription and restarting a similarly named VM in the wrong environment.

Operator quick checks

Check owner tags, resource group, and VM name before running a disruptive restart command.
Confirm traffic drain, job pause, or failover status for the workload role.
Capture instance view and recent alerts before restart so the before state is not lost.
After restart, test the application endpoint, monitoring heartbeat, extension state, and relevant logs.

Questions to ask

What change or symptom requires a full VM restart instead of a smaller service restart?
What dependency order matters if this VM is part of a cluster or application tier?
Who will confirm that the application, not just the VM power state, is healthy afterward?
What evidence should be captured if the VM fails to boot or takes longer than expected?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph