AI and Machine Learning Azure Machine Learning premium

Compute instance

Compute instance means a managed Azure Machine Learning cloud workstation used by one data scientist or engineer for notebooks, experimentation, development, and lightweight testing. Teams use it to give practitioners a ready development environment close to workspace data while IT controls size, identity, network access, schedules, and lifecycle. In Azure work, operators usually see it in portal settings, deployment output, metrics, logs, and runbooks. The practical question is who owns it, what scope it affects, and what evidence proves it is working.

Aliases
No aliases mapped yet
Difficulty
Beginner
CLI mappings
3
Last verified
2026-05-12

Microsoft Learn

An Azure Machine Learning compute instance is a managed cloud-based workstation for data scientists, with one owner and development tools for machine learning workflows.

Microsoft Learn: What is an Azure Machine Learning compute instance?2026-05-12

Technical context

Technically, Compute instance is a managed compute resource in an Azure Machine Learning workspace that is optimized for development and can run notebooks and development jobs. Engineers verify it with service configuration, IDs, logs, metrics, request records, and deployment evidence. Important configuration includes VM size, owner, assigned identity, virtual network, SSH access, idle shutdown, schedules, setup scripts, tags, and workspace permissions. Production reviews should capture owner, scope, region, identity, limits, recent changes, and diagnostics before changing behavior.

Why it matters

Compute instance matters because unmanaged compute instances can expose data, run costly idle machines, or blur the line between development and production jobs. The business impact is rarely abstract: users see slower workflows, missing data, failed automation, audit gaps, support delays, or unexpected cost when the term is misunderstood. A strong glossary entry gives architects, developers, security reviewers, and operators the same language for design reviews and incident handoffs. It connects Azure configuration to measurable objectives, ownership, rollback paths, and evidence, so teams treat it as an operational control rather than a portal label. That discipline helps teams make safer changes under pressure.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

You see Compute instance in Azure Machine Learning workspaces, notebooks, compute pages, and user assignments when confirming VM size, owner, running state, schedule, storage, and network access for release, audit, or incident evidence.

Signal 02

You see Compute instance during troubleshooting when notebooks cannot start or idle instances keep billing and operators must connect portal state, CLI output, logs, metrics, owners, and rollback notes.

Signal 03

You see Compute instance in architecture reviews when teams decide which personal development compute is allowed and monitored, how evidence is gathered, and how it affects security, reliability, operations, cost, and performance.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Provide a governed notebook workstation for a data scientist.
  • Find idle or oversized instances during cost reviews.
  • Confirm identity and network settings before accessing sensitive datasets.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Data science workstation standardization

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Proseware Analytics had data scientists building models on unmanaged laptops with inconsistent packages and poor access controls.

Business/Technical Objectives
  • Provide managed notebook environments
  • Reduce local data downloads
  • Standardize development VM sizes
  • Cut idle workstation cost by 25 percent
Solution Using Compute instance

The platform team provisioned Azure Machine Learning compute instances for each data scientist, using standard VM sizes, managed identity, and private workspace networking. Setup scripts installed approved packages and security tools. Datastore permissions allowed access to curated datasets without downloading raw files to laptops. Idle shutdown schedules stopped instances after business hours, while tags mapped each instance to project and owner. Operators reviewed state, owner, and runtime monthly before resizing or deleting unused instances. The runbook captured owner, environment, approval link, rollback condition, and the exact Azure evidence operators had to collect before and after each change. A dashboard tracked adoption, exceptions, and operational signals so support, security, and finance teams could review outcomes without relying on informal notes. The team reviewed results after the pilot and kept the design in the standard platform checklist for future deployments.

Results & Business Impact
  • Local dataset downloads dropped 74 percent
  • Notebook setup time fell from days to hours
  • Idle workstation cost dropped 31 percent
  • Every instance had project and owner tags
Key Takeaway for Glossary Readers

Compute instances give data scientists flexible development workstations without abandoning enterprise controls.

Case study 02

Healthcare model development access

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Meadowbrook Health needed researchers to explore clinical datasets while keeping workstations inside approved Azure network boundaries.

Business/Technical Objectives
  • Keep research notebooks inside private networking
  • Limit dataset access by study role
  • Provide reproducible setup scripts
  • Remove access quickly when studies end
Solution Using Compute instance

Architects deployed compute instances in a secured Azure Machine Learning workspace with private endpoints and role-based access. Each researcher received an owned instance tied to study membership, and managed identities controlled datastore access. Setup scripts configured approved libraries and disabled unsupported package sources. Logs captured start, stop, and owner changes. At study closeout, automation stopped instances, exported required artifacts, removed datastore permissions, and deleted machines after review. The runbook captured owner, environment, approval link, rollback condition, and the exact Azure evidence operators had to collect before and after each change. A dashboard tracked adoption, exceptions, and operational signals so support, security, and finance teams could review outcomes without relying on informal notes. The team reviewed results after the pilot and kept the design in the standard platform checklist for future deployments.

Results & Business Impact
  • Notebook traffic stayed inside approved network paths
  • Study-based access reviews passed without findings
  • Research setup became repeatable across teams
  • Closed-study instances were removed within five days
Key Takeaway for Glossary Readers

Compute instances can support sensitive research when ownership, networking, and lifecycle controls are explicit.

Case study 03

Retail experimentation cost control

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Fourth Coffee’s recommendation team used oversized development machines for exploratory notebooks and created unpredictable monthly Azure spend.

Business/Technical Objectives
  • Reduce idle compute-instance spend
  • Right-size development workstations
  • Separate experiments from batch training
  • Improve cost visibility by product team
Solution Using Compute instance

Cloud architects audited Azure Machine Learning compute instances by owner, VM size, runtime hours, and project tags. They moved long-running training jobs to compute clusters and kept compute instances for notebook development only. Idle shutdown schedules, approved VM-size policies, and monthly cleanup reports were added. Product teams received dashboards showing instance cost, last activity, and owner. Operators used CLI checks before stopping machines so active notebook work was not lost unexpectedly. The runbook captured owner, environment, approval link, rollback condition, and the exact Azure evidence operators had to collect before and after each change. A dashboard tracked adoption, exceptions, and operational signals so support, security, and finance teams could review outcomes without relying on informal notes. The team reviewed results after the pilot and kept the design in the standard platform checklist for future deployments.

Results & Business Impact
  • Compute-instance spend fell 38 percent
  • Training jobs moved to autoscaling clusters
  • Ownerless instances dropped to zero
  • Product teams received monthly cost evidence
Key Takeaway for Glossary Readers

Compute instances stay cost-effective when they are treated as personal development workstations, not hidden production servers.

Why use Azure CLI for this?

Use Azure ML CLI checks to verify instance ownership, state, size, identity, and network settings before troubleshooting notebooks or cost.

CLI use cases

  • List compute instances to find idle or ownerless development workstations.
  • Show instance configuration during access or network troubleshooting.
  • Start, stop, or delete an instance after confirming project ownership and saved work.

Before you run CLI

  • Confirm the active tenant, subscription, resource group, workspace, account, or region before running commands.
  • Use least-privileged access and avoid storing secrets, prompts, certificates, tokens, or personal data in command output.
  • Know whether the command is read-only, mutating, cost-impacting, security-impacting, or destructive before production use.

What output tells you

  • Output confirms whether the live Azure configuration exists at the expected scope and matches the approved design.
  • Returned IDs, settings, metrics, timestamps, or logs help separate configuration drift from application behavior.
  • Differences between expected and actual state create evidence for rollback, escalation, audit, or owner follow-up.

Mapped Azure CLI commands

Ml operations

direct
az ml workspace list --resource-group <resource-group>
az ml workspacediscoverAI and Machine Learning
az ml workspace show --name <workspace> --resource-group <resource-group>
az ml workspacediscoverAI and Machine Learning
az ml workspace create --name <workspace> --resource-group <resource-group> --location <region>
az ml workspaceprovisionAI and Machine Learning
az ml compute list --workspace-name <workspace> --resource-group <resource-group>
az ml computediscoverAI and Machine Learning
az ml model list --workspace-name <workspace> --resource-group <resource-group>
az ml modeldiscoverAI and Machine Learning
az ml online-endpoint list --workspace-name <workspace> --resource-group <resource-group>
az ml online-endpointdiscoverAI and Machine Learning

Architecture context

Technically, Compute instance is a managed compute resource in an Azure Machine Learning workspace that is optimized for development and can run notebooks and development jobs. Engineers verify it with service configuration, IDs, logs, metrics, request records, and deployment evidence. Important configuration includes VM size, owner, assigned identity, virtual network, SSH access, idle shutdown, schedules, setup scripts, tags, and workspace permissions. Production reviews should capture owner, scope, region, identity, limits, recent changes, and diagnostics before changing behavior.

Security

Security for Compute instance starts with understanding owner assignment, workspace RBAC, managed identity, datastore access, SSH settings, notebook content, secrets, network isolation, and local files on the VM. Review identities, roles, secrets, network paths, data classification, logs, and who can change the setting. Prefer least privilege, private access when available, managed identity or protected credentials, and audit evidence. Watch for broad permissions, sensitive data in logs, shared keys, public endpoints, stale owners, and exceptions without expiry. Production use should include an approved owner, access boundary, alert routing, and a revocation process operators can execute during an incident. Security reviewers should tie every exception to risk acceptance and expiry.

Cost

Cost for Compute instance comes from VM runtime hours, idle notebooks, GPU workstation choices, storage, monitoring, setup scripts, and abandoned instances after projects or employees move on. Direct costs may be obvious, but indirect costs can appear as retries, duplicate processing, idle capacity, data movement, investigation time, or support effort. Review budgets, tags, usage metrics, quota, retention, SKU, and forecasts before enabling or scaling it. Connect spend to business-unit ownership and expected workload value. Define normal usage, alert thresholds, cleanup rules, and exception approval before the feature becomes a hidden default across environments. Finance teams need evidence that the cost aligns to real demand, not leftover experiments.

Reliability

Reliability for Compute instance depends on instance start and stop behavior, owner availability, notebook environment consistency, setup scripts, workspace dependencies, and recovery if the workstation is deleted. Operators should know the expected failure mode, dependency chain, recovery target, and whether retries, failover, reprocessing, or manual approval are required. Monitor health, latency, quota, backlog, error rates, stale state, and downstream failures. Test behavior during maintenance, regional incidents, expired credentials, schema changes, and burst traffic. Runbooks should explain how to validate current state, preserve evidence, reduce blast radius, and restore service without duplicate work or data loss. Reliability reviews should include the human handoff path, not only platform health.

Performance

Performance for Compute instance is about VM size, CPU or GPU capability, memory, notebook responsiveness, data access latency, package environment, and whether development jobs overload the workstation. Measure signals that reflect user or workload experience, such as latency, throughput, request units, node startup time, model response time, queue depth, cache behavior, or throttled operations. Avoid tuning one setting in isolation when identity, network path, partitioning, model size, region, or downstream capacity may be the real bottleneck. Compare baseline and peak results after changes, then document which limit would be reached first as demand grows. Keep tests close to production patterns.

Operations

Operationally, Compute instance needs clear ownership, naming, tagging, change records, and repeatable verification. Teams should know where it appears, which commands or queries prove state, which dashboard shows health, and what is safe to change during business hours. Keep examples, approvals, rollback notes, and exception records with the service runbook rather than personal notes. For production changes, capture before-and-after evidence, including resource IDs, region, tenant, policy assignment, deployment version, and linked services. Review stale resources and permissions regularly. Escalation contacts should stay current as teams reorganize. This prevents tribal knowledge from becoming the only support path. It also helps new operators support the service with confidence.

Common mistakes

  • Using a compute instance as unattended production infrastructure.
  • Leaving GPU instances running after notebook experimentation ends.
  • Granting broad datastore access because the instance owner changes often.