Compute Virtual Machines verified field-manual operator-field-manual

VM run command

VM run command lets an authorized Azure operator run a script inside a VM without opening RDP or SSH first. For Windows, that usually means PowerShell; for Linux, shell scripts are common. It is not a general replacement for configuration management, but it is powerful during diagnostics, emergency fixes, and evidence collection. The VM agent must be healthy enough to receive the command and return results. Because the command runs inside the guest, operators should treat it like privileged remote administration.

Aliases
VM run command, vm run command
Difficulty
intermediate
CLI mappings
5
Last verified
2026-05-29

Microsoft Learn

VM run command is an Azure feature for running scripts inside a virtual machine through Azure management access, commonly for diagnostics or remediation. It relies on the VM agent and is useful when normal remote access is blocked, unavailable, or operationally inconvenient.

Microsoft Learn: Run scripts in your Windows VM by using action Run Commands2026-05-29

Technical context

Technically, run command is exposed under the Microsoft.Compute VM resource and surfaced through portal, CLI, PowerShell, SDK, and REST. The operation sends a script request through Azure control-plane channels to the VM agent, which executes it in the guest context and returns output, errors, and status. It sits between Azure management and guest operating-system administration. Run command is often used with boot diagnostics, extensions, Update Manager, serial console, and log collection when normal network paths or credentials are not enough.

Why it matters

VM run command matters because production access is often constrained by firewalls, private networks, broken SSH, expired credentials, or emergency policy. Run command gives teams a controlled way to ask the guest OS what is happening or apply a narrow fix without exposing inbound ports. It can save an outage when agents, routes, local firewall rules, or services need inspection. It is also risky: scripts can change files, leak secrets in output, or make a broken server worse. Strong teams limit permissions, log usage, keep scripts small, and prefer read-only diagnostics before remediation. Record the script version and reviewed intent with incident evidence.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal Run command blade, built-in commands and custom scripts appear for operators who need guest actions without SSH or RDP access for production review.

Signal 02

In Azure CLI, az vm run-command invoke shows script status, returned output, error text, timeout behavior, and VM-agent delivery failures for the selected target for production review.

Signal 03

In Activity Log and security workbooks, run command appears as a privileged Compute action tied to caller, resource ID, timestamp, and incident context for production review.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Collect disk, service, or firewall evidence from a private VM when SSH or RDP is unavailable.
  • Repair a local guest firewall rule that accidentally blocked normal administrative access.
  • Run a small approved diagnostic across selected VMs during an incident without opening public inbound ports.
  • Restart a broken guest agent or application service after confirming the script is safe and scoped.
  • Export emergency evidence for support while preserving a clear Azure Activity log trail.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Fintech operations repairs a locked-down Linux VM

Fintech operations repairs a locked-down Linux VM: Run command is powerful when network access is broken, but it must be treated as audited privileged guest administration.

Scenario

A fintech payment processor locked down inbound SSH on production VMs, and one Linux gateway became unreachable after a local firewall deployment blocked the private admin subnet.

Business/Technical Objectives
  • Restore administrative access without opening public SSH.
  • Capture an auditable record of the emergency guest change.
  • Limit remediation to one affected gateway VM first.
  • Validate payment routing before applying any wider fix.
Solution Using VM run command

The platform team chose VM run command because Azure management access was still available and the VM agent reported healthy. They wrote a short Bash script that printed current firewall rules, added the missing private subnet allow rule, and restarted the local firewall service. The script was peer-reviewed in chat, pasted into a secure terminal, and invoked with az vm run-command invoke against the exact VM resource ID. Output was saved in the incident ticket, with sensitive values redacted. After access returned, engineers connected through the normal private path, verified the rule file, restarted the payment gateway service, and ran synthetic authorization tests. A fleet-wide check used read-only run command to confirm no other gateway had the bad rule.

Results & Business Impact
  • Private administrative access returned in 14 minutes without exposing public SSH.
  • Payment authorization errors fell from 18 percent to normal baseline within one probe cycle.
  • Only one VM received the remediation script before validation succeeded.
  • The change record included caller, script, output, and Activity-log evidence.
Key Takeaway for Glossary Readers

Run command is powerful when network access is broken, but it must be treated as audited privileged guest administration.

Case study 02

University research team collects evidence from isolated Windows VMs

University research team collects evidence from isolated Windows VMs: Run command is strongest when it gathers precise evidence first and leaves durable configuration changes to managed pipelines.

Scenario

A university research department placed Windows analysis VMs in private subnets, and several machines stopped uploading experiment results after a failed agent update.

Business/Technical Objectives
  • Collect service and disk evidence without changing network rules.
  • Identify whether the upload agent or storage credentials caused the outage.
  • Keep researchers out of local administrator sessions.
  • Restore result uploads before a grant-reporting deadline.
Solution Using VM run command

The cloud administrator used VM run command to execute a read-only PowerShell diagnostic across six selected VMs. The script checked service status, recent event log entries, free disk space, installed agent version, and connectivity to the private storage endpoint. Because output could include file paths and project names, the team saved it to a restricted incident workspace. The results showed two VMs had a stopped upload service and four had an outdated agent configuration. A second, reviewed run command restarted the stopped service only on the two affected machines. The configuration issue was fixed later through the normal management pipeline rather than ad hoc scripting.

Results & Business Impact
  • Diagnostic evidence was collected from six VMs in under 20 minutes.
  • Result uploads resumed for the stopped-service machines the same afternoon.
  • No inbound RDP rules or temporary public IPs were created.
  • The agent configuration fix moved into source control instead of staying as an emergency script.
Key Takeaway for Glossary Readers

Run command is strongest when it gathers precise evidence first and leaves durable configuration changes to managed pipelines.

Case study 03

Media platform verifies disk pressure during live event

Media platform verifies disk pressure during live event: Run command can safely bridge urgent guest diagnostics when scripts are preapproved, narrow, and tied to application health checks.

Scenario

A media streaming platform saw encoding failures during a live sports event, but the affected encoder VM was in a restricted subnet with no standing administrator access.

Business/Technical Objectives
  • Check disk pressure and encoder service state within five minutes.
  • Avoid changing NSGs or Bastion policy during the live event.
  • Restart only the failed encoder process if evidence supported it.
  • Keep output small enough for rapid incident review.
Solution Using VM run command

The incident commander approved a predefined VM run command script from the operations repository. The script reported free space, top temporary folders, encoder process status, recent service errors, and current stream queue length. Azure CLI invoked the script against one encoder VM, and output showed the temporary work folder was full from an abandoned transcode job. A second approved script removed only files older than the active stream window and restarted the encoder process. The team watched application telemetry and stream health before touching any other instance. After the event, they replaced the manual cleanup script with an automated retention rule and alert.

Results & Business Impact
  • Diagnostics returned in three minutes, meeting the live-event response target.
  • Encoder error rate dropped from 22 percent to under 1 percent after cleanup.
  • No network security policy exception was needed during the broadcast.
  • The post-event fix reduced similar disk-pressure alerts by 87 percent.
Key Takeaway for Glossary Readers

Run command can safely bridge urgent guest diagnostics when scripts are preapproved, narrow, and tied to application health checks.

Why use Azure CLI for this?

I use Azure CLI for run command because the value is precision and repeatability. A ten-year Azure engineer wants the exact script, target VM, parameters, output, and timestamp captured in a terminal or pipeline record. The portal is fine for a quick one-off, but CLI lets you run the same diagnostic across selected VMs, store scripts in source control, redact output, and fail fast when the VM agent is unhealthy. It also helps avoid opening inbound SSH or RDP just to run a small check. The discipline is to keep commands scoped, reviewed, and reversible. Record the script version and reviewed intent with incident evidence.

CLI use cases

  • Invoke a read-only PowerShell or shell diagnostic script against a named VM and save the output.
  • Pass parameters to a reviewed remediation script stored in source control.
  • List available run commands or inspect command status when troubleshooting agent execution.
  • Automate a narrow fleet check by tag or resource group while limiting concurrency and output size.

Before you run CLI

  • Confirm the active subscription, VM name, resource group, OS type, and RBAC permission to invoke guest commands.
  • Review the script for destructive actions, secrets, long-running loops, and output that could expose sensitive data.
  • Check whether the VM agent is healthy enough to run commands and return results.
  • Prefer a read-only diagnostic first, then seek approval before remediation that changes files, services, firewall rules, or users.

What output tells you

  • Run-command status shows whether Azure accepted the request and whether the guest agent reported success or failure.
  • Standard output and error streams show script results, but they should be treated as sensitive operational evidence.
  • Timeouts or empty output suggest VM agent issues, guest OS problems, blocked outbound connectivity, or a script that hung.
  • Activity-log entries identify who invoked the command and when, which matters for audit and incident reconstruction.

Mapped Azure CLI commands

VM run command operations

direct
az vm run-command invoke --resource-group <resource-group> --name <vm-name> --command-id RunShellScript --scripts "<script>"
az vm run-commandoperateCompute
az vm run-command invoke --resource-group <resource-group> --name <vm-name> --command-id RunPowerShellScript --scripts "<script>"
az vm run-commandoperateCompute
az vm run-command list --location <region> --output table
az vm run-commanddiscoverCompute
az vm get-instance-view --resource-group <resource-group> --name <vm-name> --output json
az vmdiscoverCompute
az monitor activity-log list --resource-group <resource-group> --offset 2h
az monitor activity-logdiscoverCompute

Architecture context

Architecturally, run command is an operational bridge between Azure control plane and guest administration. It belongs in break-glass diagnostics, controlled remediation, and fleet inspection, not in everyday application deployment pipelines that should use images, extensions, desired-state tools, or CI/CD. Designs with private subnets, no public IPs, and strong just-in-time access can still support emergency guest checks through run command if RBAC is tightly managed. The feature also influences incident architecture: teams can collect evidence when network paths are broken, but they must document command authority, output handling, and script approval. Record the script version and reviewed intent with incident evidence.

Security

Security impact is direct and high. Anyone who can invoke run command can potentially execute powerful guest actions, depending on OS, agent behavior, and script contents. Use least-privilege RBAC, Privileged Identity Management, change approval, and logging for production use. Do not place secrets in command text or allow sensitive values to appear in returned output. Validate scripts before running them, especially when copying from tickets or chats. Network isolation does not remove this risk because the command arrives through Azure management channels. Treat run command as privileged remote administration with an audit trail. Record the script version and reviewed intent with incident evidence.

Cost

Run command has no typical standalone compute charge, but the cost path is operational risk and labor. It can save money by avoiding rebuilds, support escalations, bastion sessions, or emergency firewall openings. It can also create expensive mistakes if a script deletes data, stops revenue services, or exposes secrets that trigger incident response. Fleet-wide run command use should be scoped carefully because running diagnostics on many VMs consumes operator time and guest resources. Cost-aware teams maintain small, reviewed scripts that collect exactly the needed evidence and avoid broad changes during business hours. Record the script version and reviewed intent with incident evidence.

Reliability

Reliability impact depends on script behavior and VM agent health. A read-only diagnostic command can improve reliability by shortening troubleshooting. A poorly tested remediation script can stop services, fill disks, change firewall rules, or hang during execution. Run command is also unavailable when the VM agent is broken, blocked from required outbound access, or the guest OS is too unhealthy to respond. Keep scripts short, idempotent, timed, and targeted. For high-availability workloads, run commands against one instance at a time and verify application health before expanding to more machines. Record the script version and reviewed intent with incident evidence. Record the script version and reviewed intent with incident evidence.

Performance

Performance impact is usually temporary and script-dependent. Run command itself is not a performance tuning feature, but scripts can inspect CPU, memory, disk, network, services, and logs without establishing remote sessions. Heavy scripts can consume CPU, read large logs, lock files, or compete with the workload. Returned output can also be truncated or slow if operators try to dump too much data. Use targeted commands, timeouts, and summary output. For performance incidents, collect the minimum evidence needed, then use monitoring and profiling tools for sustained analysis. Record the script version and reviewed intent with incident evidence. Record the script version and reviewed intent with incident evidence.

Operations

Operators use run command to collect logs, inspect services, repair local firewall rules, restart agents, validate disk space, or recover access when normal management paths fail. Good operations begin with a read-only script and explicit output handling. Store known-good scripts in a repository, include ownership and risk notes, and avoid improvising complex commands during a stressful incident. Capture command ID, caller, VM, script version, output, and follow-up validation. When run command fails, check VM agent status, extension health, network access to Azure endpoints, guest OS boot state, and RBAC permissions. Record the script version and reviewed intent with incident evidence.

Common mistakes

  • Using run command as an unreviewed deployment mechanism instead of proper images, extensions, or application pipelines.
  • Pasting secrets, tokens, or passwords into script text and then exposing them in logs or command history.
  • Running a remediation script across every VM at once without canary testing or health validation.
  • Assuming failure means the command is wrong when the VM agent or outbound connectivity may be broken.