App platform App Service verified operator-field-manual

Worker instance

A worker instance is one running compute copy behind an App Service plan. When you scale out a web app or the plan that hosts it, Azure adds more worker instances so requests can be spread across more machines. Each instance has its own CPU, memory, local temporary storage, process state, and app workers. Developers should assume requests can land on any healthy instance. Operators care about worker instances when diagnosing uneven load, memory pressure, cold starts, unhealthy instances, deployment restarts, and scale-out cost.

Aliases
App Service worker, App Service instance, web app instance, worker VM instance
Difficulty
fundamentals
CLI mappings
6
Last verified
2026-05-29

Microsoft Learn

In Azure App Service, a worker instance is one compute instance in the App Service plan that can run hosted apps. Scaling out increases the number of instances, while plan SKU and platform settings determine available CPU, memory, and scale limits.

Microsoft Learn: Scale up features and capacities - Azure App Service2026-05-29

Technical context

Technically, worker instances belong to the App Service plan, which defines the compute SKU, operating system, scale limits, and pricing boundary for hosted Web Apps, API apps, and some Functions scenarios. A scaled-out plan runs multiple workers, and apps are allocated across them by the platform. Per-app scaling can limit how many plan instances a specific app uses in supported tiers. Worker behavior interacts with Always On, health checks, deployment slots, ARR affinity, autoscale rules, local cache, application settings, diagnostics, and runtime-specific process models.

Why it matters

Worker instances matter because App Service scale is not abstract when production traffic spikes. If an app relies on local memory, sticky sessions, local files, or single-instance assumptions, adding workers can expose hidden design flaws. If too few workers run, customers see slow responses, queue buildup, or failed requests. If too many run, the plan becomes an expensive idle fleet. Understanding worker instances helps teams separate scale-up from scale-out, tune autoscale, investigate noisy neighbors inside a shared plan, and design stateless apps that survive restarts, deployments, and platform maintenance without surprising users. It also guides realistic availability and capacity commitments.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In App Service plan scale-out settings, operators see current instance count, autoscale rules, SKU limits, and per-app scaling options during production capacity reviews and planning.

Signal 02

In Azure CLI or portal instance views, web apps show individual worker instance identifiers, status, and allocation clues useful during uneven-load investigations or slot swaps.

Signal 03

In metrics and logs, worker instances appear through per-instance CPU, memory, requests, restarts, HTTP errors, and health-check failures during incidents, deployments, or scale event reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Scale a production web app horizontally during seasonal traffic while keeping each instance stateless and replaceable.
  • Diagnose one unhealthy or overloaded App Service instance when aggregate plan metrics look normal.
  • Use per-app scaling to keep a low-priority app from consuming every worker in a shared plan.
  • Separate a memory-heavy application into its own plan after it creates noisy-neighbor pressure for other apps.
  • Right-size App Service spend by comparing worker count, SKU, utilization, autoscale history, and business demand.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Ticketing platform survives a stadium onsale rush

A ticketing platform hosted its checkout API on Azure App Service. A stadium onsale created sudden traffic bursts that overloaded two workers and produced long payment queues. The

Scenario

A ticketing platform hosted its checkout API on Azure App Service. A stadium onsale created sudden traffic bursts that overloaded two workers and produced long payment queues.

Business/Technical Objectives
  • Keep checkout p95 latency below eight hundred milliseconds during onsale peaks.
  • Avoid session loss when requests land on different workers.
  • Give operations a safe emergency scale-out command path.
  • Return capacity to normal after the sales window to control spend.
Solution Using Worker instance

The engineering team redesigned the app to treat each worker instance as disposable. Session data moved to Azure Cache for Redis, uploaded artifacts moved to Blob Storage, and health checks were tuned to fail only when checkout dependencies were actually unhealthy. The App Service plan scaled out from two to eight workers for onsale windows, with autoscale rules watching CPU, HTTP queue length, and p95 latency. CLI runbooks showed the plan, updated worker count when needed, listed web app instances, and recorded the scale-back step. Deployment slots warmed the new release before traffic switched.

Results & Business Impact
  • Checkout p95 latency stayed at six hundred eighty milliseconds during the largest onsale.
  • Payment timeout errors dropped by 46 percent compared with the prior event.
  • No customer sessions were lost during scale-out testing or production traffic spikes.
  • Automated scale-back reduced post-event idle worker spend by an estimated 38 percent.
Key Takeaway for Glossary Readers

Worker instances only help under pressure when the application is designed to run correctly across many replaceable copies.

Case study 02

Legal SaaS isolates a memory-hungry document renderer

A legal technology company ran its customer portal and document rendering service in the same App Service plan. Large court filings caused memory spikes that slowed unrelated porta

Scenario

A legal technology company ran its customer portal and document rendering service in the same App Service plan. Large court filings caused memory spikes that slowed unrelated portal requests.

Business/Technical Objectives
  • Stop document rendering from degrading portal response time.
  • Identify which worker instances and processes were consuming memory.
  • Move the noisy workload without a customer-visible outage.
  • Reduce over-scaling caused by one app affecting the whole plan.
Solution Using Worker instance

Operators used worker-instance metrics, web app instance listings, and application logs to prove that the renderer saturated memory on several shared workers. The architecture team moved rendering to a dedicated Premium plan with its own worker instances and kept the portal on the original plan with lower autoscale limits. Deployment slots and health checks validated the move before traffic was routed. CLI captured both plan configurations, listed apps sharing each plan, and documented the new ownership boundary. The renderer also received a queue-based throttle so it could not grow beyond planned worker capacity.

Results & Business Impact
  • Portal p95 latency improved from 2.7 seconds to 640 milliseconds during large filing uploads.
  • Emergency scale-out events on the portal plan dropped from nine per month to one.
  • Renderer failures became isolated to its own plan and no longer triggered portal incident bridges.
  • Monthly App Service spend fell 22 percent after the original plan stopped scaling for renderer spikes.
Key Takeaway for Glossary Readers

Understanding worker-instance boundaries helps teams find noisy-neighbor problems that aggregate plan metrics can hide.

Case study 03

City permitting portal fixes uneven load after migration

A city government migrated its permitting portal to App Service. After launch, residents saw intermittent timeouts even though average CPU on the plan looked healthy. The operation

Scenario

A city government migrated its permitting portal to App Service. After launch, residents saw intermittent timeouts even though average CPU on the plan looked healthy.

Business/Technical Objectives
  • Find why only some requests failed during permit submission peaks.
  • Remove assumptions that depended on a single web server.
  • Improve availability without immediately moving to a larger SKU.
  • Give the support team clear evidence for public incident updates.
Solution Using Worker instance

The operations team listed live web app instances and found one worker showing high memory and repeated restarts while other instances were healthy. Logs showed that temporary PDF files were written to local instance storage and reused by later requests. Engineers moved generated PDFs to Blob Storage, disabled the local-file assumption, and added a health check path that detected stuck rendering threads. The plan stayed at three worker instances, but autoscale rules were adjusted for queue length. CLI output and per-instance charts were added to the incident report so leadership could understand the failure mode.

Results & Business Impact
  • Submission timeout rate fell from 6.4 percent to 0.7 percent after local file state was removed.
  • The team avoided an unnecessary SKU upgrade that was projected to add 41 percent to monthly cost.
  • Per-instance restart alerts caught a later rendering bug before residents reported failures.
  • Public status updates became more specific because support could reference affected instance behavior.
Key Takeaway for Glossary Readers

Per-instance visibility is often the difference between blindly scaling out and fixing the actual App Service failure mode.

Why use Azure CLI for this?

I use Azure CLI for worker-instance work because scaling and inspection need to be fast, repeatable, and unambiguous during pressure. After ten years running Azure workloads, I want exact instance counts, plan SKU, app allocation, and live instances available without clicking through portal blades. CLI lets me show the App Service plan, update worker count, inspect web app instances, compare slots, export app settings, and feed autoscale reviews. It also reduces mistakes during incidents because the command history records what changed, when it changed, and which subscription and resource group were targeted. That matters when everyone watches the production incident bridge.

CLI use cases

  • Show an App Service plan to confirm SKU, worker count, operating system, and resource group before scaling.
  • Update the number of workers when approved autoscale or emergency scale-out is required.
  • List web app instances to identify whether traffic and process health are distributed as expected.
  • Compare web apps sharing a plan to find noisy-neighbor risk or consolidation opportunities.
  • Export plan and app settings as evidence before moving workloads to another plan or tier.

Before you run CLI

  • Verify tenant, subscription, resource group, App Service plan name, and whether production slots are involved.
  • Confirm permissions because updating worker count or SKU changes cost and can trigger platform movement.
  • Check tier limits; some worker counts, autoscale features, and per-app scaling options depend on the SKU.
  • Know the current autoscale rules so a manual change does not fight automated scale-in or scale-out.
  • Use output filters carefully; instance IDs and plan IDs are easy to confuse during multi-app incidents.

What output tells you

  • Plan output shows SKU, location, worker count, operating system, and identifiers that define the compute boundary.
  • Web app instance output shows the actual scaled-out instances serving the app or slot at the time of inspection.
  • App lists reveal which workloads share the same plan and may compete for worker resources.
  • Metrics tied to instance dimensions show whether one worker is hotter, leakier, or less healthy than others.
  • Update output confirms the control-plane request, while metrics and logs prove the new workers are actually helping.

Mapped Azure CLI commands

App Service worker and plan commands

direct
az appservice plan show --name <plan-name> --resource-group <resource-group>
az appservice plandiscoverWeb
az appservice plan update --name <plan-name> --resource-group <resource-group> --number-of-workers <count>
az appservice planconfigureApp platform
az webapp list-instances --name <app-name> --resource-group <resource-group> --output table
az webappdiscoverApp platform
az appservice plan list --resource-group <resource-group> --output table
az appservice plandiscoverWeb
az webapp list --resource-group <resource-group> --query "[].{name:name,state:state,plan:serverFarmId}" --output table
az webappdiscoverApp platform
az monitor metrics list --resource <app-service-plan-resource-id> --metric CpuPercentage,MemoryPercentage
az monitor metricsdiscoverMonitoring and Observability

Architecture context

Architecturally, worker instances are the horizontal compute layer of App Service. They sit below the web app resource and above the managed platform fabric. The plan is the scale and billing boundary; the app is the deployment and runtime boundary. Architects decide whether multiple apps should share workers, whether a critical app needs its own plan, whether per-app scaling is appropriate, and whether state should move to Redis, Storage, SQL, or another external service. Worker count affects availability, blast radius, cold-start behavior, deployment rollout, memory isolation, and cost. Good designs treat each instance as disposable. Apply that rule for every hosted production app.

Security

Security impact is mostly about isolation and exposure boundaries. A worker instance itself is managed by Azure, but apps sharing an App Service plan share the same underlying compute pool allocation model and operational fate. Sensitive workloads should avoid unnecessary plan sharing with unrelated apps, especially when teams, compliance scopes, or deployment cadences differ. Instance count does not replace network controls, managed identity, private endpoints, TLS, or app authentication. Operators should protect Kudu and deployment credentials, avoid writing secrets to local instance storage, and remember that logs from all instances may contain sensitive request details. Review this during forensic and access audits.

Cost

Cost is direct because App Service plan billing is tied to the selected SKU and number of workers, not just request volume. Scaling out from two to six workers can triple plan compute spend if the SKU stays the same. Idle apps in the same plan still ride on paid workers. Per-app scaling can reduce waste in supported tiers, but it does not remove the need to right-size the plan. FinOps reviews should look for over-scaled plans, low-utilization worker fleets, orphaned apps, expensive isolated tiers, and autoscale rules that never scale back after traffic events. Review this before budget owners approve scale.

Reliability

Reliability is directly tied to worker-instance design. More than one instance can absorb individual instance restarts, platform maintenance, and traffic spikes, but only if the app is stateless and health checks are meaningful. A single worker is a tighter failure point for app-process crashes, memory leaks, and deployment restarts. Uneven load, sticky sessions, unhealthy instances, or long warm-up paths can still cause failures even with scale-out. Reliable operations include autoscale rules, Always On where appropriate, health check paths, deployment slots, externalized session state, and monitoring per-instance CPU, memory, requests, and restarts. Validate it under production traffic, deployments, platform maintenance, and incidents.

Performance

Performance depends on whether enough healthy worker instances exist for the traffic and workload shape. More workers can increase request concurrency and reduce per-instance CPU or memory pressure, but it does not fix slow code, database bottlenecks, synchronous dependency calls, or poor caching. Scaling out can also reveal session affinity problems and cold-start paths. Operators should track response time, requests, CPU, memory working set, queue length, HTTP errors, and per-instance distribution. The goal is not maximum instance count; it is stable latency with enough headroom and no hidden hotspot. Measure each critical user path before, during, and after scaling events.

Operations

Operators inspect worker instances when a web app is slow, unstable, or expensive. They list App Service plans, show worker count and SKU, list live web app instances, watch per-instance metrics, and compare logs across instance IDs. Changes include scaling out, scaling in, changing SKU, enabling health checks, tuning autoscale, or moving apps to a dedicated plan. Troubleshooting often means finding one hot instance, one leaking process, one slot warming poorly, or one app consuming shared plan resources. Documentation should record intended minimum, maximum, autoscale rule, and ownership for each production plan. Document changes after emergency scale events and ownership handoffs.

Common mistakes

  • Assuming scale-out fixes a database bottleneck or slow downstream dependency.
  • Storing sessions, files, or cache only on local instance storage and then adding workers.
  • Scaling a shared plan without checking which other apps will consume the extra paid capacity.
  • Ignoring warm-up, health check, or slot behavior before adding instances during a peak event.
  • Leaving emergency scale-out in place after traffic returns to normal and creating avoidable monthly spend.