Per-site scaling lets apps in the same App Service plan use different numbers of workers. Normally, when a plan scales out to five instances, every app in that plan can run on all five. With per-site scaling enabled, one quiet app might use one worker while a busy app uses four. This is useful when many related apps share a paid plan but do not need equal capacity. It is not autoscale by itself; it is a placement and worker-count control.
per-app scaling, PerSiteScaling, App Service per-app scaling, per site scaling, high-density App Service scaling
Difficulty
intermediate
CLI mappings
4
Last verified
2026-05-17
Microsoft Learn
Per-site scaling, also called per-app scaling, lets an Azure App Service plan scale individual apps to different worker counts after the plan enables the PerSiteScaling setting. It supports high-density hosting by letting selected apps use fewer or more plan instances instead of every app occupying every worker.
In Azure architecture, per-site scaling belongs to the App Service control plane. The App Service plan enables the PerSiteScaling property, and individual web apps carry worker-count settings that limit how many plan instances they use. The feature interacts with SKU tier, plan worker count, deployment slots, Always On, automatic scaling, health checks, custom domains, and application telemetry. It changes capacity allocation inside one plan, but identity, networking, certificates, content deployment, and app settings remain configured per app or slot.
Why it matters
Per-site scaling matters because App Service plans are billed and operated as shared compute pools. Without per-site scaling, a low-traffic app can occupy capacity on every worker when the plan scales out for another app. That wastes capacity and makes noisy-neighbor analysis harder. With it, operators can host many small apps densely while granting extra workers only to apps that need them. The tradeoff is governance: the plan must have enough total workers, app limits must be documented, and teams must understand that per-site scaling does not replace load testing, autoscale rules, or isolation for sensitive workloads. It turns shared hosting from guesswork into a managed capacity model.
⌁
Where you see it
Signals, screens, and Azure surfaces where this term usually becomes operational.
Signal 01
In the App Service plan scale settings, the per-site scaling flag appears alongside SKU, worker count, zone, and scale-out configuration during plan consolidation reviews and capacity exceptions.
Signal 02
In Azure CLI or ARM output, the App Service plan exposes perSiteScaling while web app properties show worker-related limits or instance allocation during rightsizing reviews, owner handoffs, and traffic investigations.
Signal 03
In monitoring workbooks, one app may show high HTTP queue length or CPU while other apps in the same plan remain lightly loaded during automation audits and shared-hosting governance checks.
✦
When this becomes relevant
Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.
Host many low-traffic apps on one Standard or Premium App Service plan without giving every app all workers.
Give a busy app additional workers while keeping admin portals or test tools at lower counts.
Review shared-plan density before deciding whether to create separate App Service plans.
Troubleshoot capacity when an app is limited by app-level worker allocation rather than plan-level scale.
◆
Real-world case studies
Different enterprise-style examples that show the term being used to hit measurable objectives.
Case study 01
Tournament apps share capacity
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
ArenaBoard ran tournament registration, bracket, replay, and sponsor microsites in Azure App Service. Major finals needed extra capacity for bracket views, but archive and sponsor apps were quiet.
🎯Business/Technical Objectives
Host twelve event apps in one Premium App Service plan without giving every app all workers.
Keep bracket response time under 250 milliseconds during finals traffic.
Reduce idle worker cost by at least 25 percent compared with separate plans.
Document capacity ownership for event operations and finance teams.
✅Solution Using Per-site scaling
The platform team enabled PerSiteScaling on a Premium v3 App Service plan and scaled the plan to six workers for finals weekend. Using Azure CLI, they recorded plan SKU, worker count, app IDs, and app-level worker allocations. The bracket app used five workers, registration used three, and low-traffic archive apps stayed at one. Application Insights dashboards separated app latency from plan CPU so operators could see when one app needed adjustment without moving every site.
📈Results & Business Impact
Finals traffic peaked at 18,000 concurrent viewers while bracket P95 response time stayed at 210 milliseconds.
Monthly compute spend fell 31 percent compared with the prior design that used separate plans for each microsite.
Capacity reviews took 20 minutes because app allocations and plan workers were exported as JSON before each event.
No sponsor or archive app consumed capacity needed by the live tournament workloads.
💡Key Takeaway for Glossary Readers
Per-site scaling lets shared App Service plans act like managed capacity pools when each app’s worker allocation is deliberate.
Case study 02
Legal portals avoid noisy neighbors
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
LexHarbor hosted document intake, e-signature callbacks, and admin portals for law firms on App Service. Intake spikes during filing deadlines slowed the admin portal because all apps rode the same scaled-out plan.
🎯Business/Technical Objectives
Allocate more workers to intake without moving every low-traffic app to a new plan.
Maintain admin portal availability during court filing deadline spikes.
Create evidence for which client workload owned each capacity decision.
Avoid increasing compute spend by more than 15 percent during the busy season.
✅Solution Using Per-site scaling
Engineers enabled per-site scaling on the shared plan and mapped each app to an owner, cost center, and expected worker allocation. Azure CLI inventory jobs exported perSiteScaling, plan workers, app resource IDs, and tags before changes. Intake ran on four workers, e-signature callbacks on two, and admin on two dedicated workers inside the same plan. A workbook monitored HTTP queue length, CPU, and failed requests by app so operators could adjust allocation without assuming the entire plan was unhealthy.
📈Results & Business Impact
Admin portal availability stayed above 99.95 percent during the deadline week.
The busy-season plan used 12 percent more compute instead of the projected 38 percent for separate plans.
Support escalations about slow document intake dropped by 64 percent after app-level worker limits were tuned.
Monthly chargeback reports showed which client workflow consumed extra workers during peak periods.
💡Key Takeaway for Glossary Readers
Per-site scaling is most useful when shared-plan density is paired with ownership, monitoring, and app-specific capacity limits.
Case study 03
Permit services scale seasonally
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
Riverton Digital Services ran park permits, business licenses, and construction inspection portals in one App Service plan. Spring park reservations spiked sharply, while other municipal apps stayed mostly idle.
🎯Business/Technical Objectives
Keep public reservation pages responsive during the annual spring opening.
Avoid creating short-lived plans for every seasonal service.
Give operations a rollback path if worker allocation caused errors.
Keep cost growth under the approved seasonal budget envelope.
✅Solution Using Per-site scaling
The city enabled per-site scaling only on the supported Standard plan used for public portals. Before the release, Azure CLI captured plan worker count, perSiteScaling status, app tags, and previous app worker values. Park permits received four workers during the reservation window, while license and inspection portals stayed at one or two. Operators set alerts on HTTP 5xx, queue length, and CPU per app, and the runbook kept the previous worker values for fast rollback after the surge ended.
📈Results & Business Impact
Reservation P95 latency improved from 1.8 seconds to 430 milliseconds during opening morning.
The city avoided three temporary App Service plans and saved an estimated 27 percent for the season.
Operators reverted seasonal worker settings in under ten minutes after traffic normalized.
Residents submitted 42 percent more reservations without a corresponding rise in support calls.
💡Key Takeaway for Glossary Readers
Per-site scaling can make public-sector seasonal workloads cheaper and steadier when worker counts are planned before the rush.
Why use Azure CLI for this?
Azure CLI is useful for per-site scaling because the important evidence spans both the App Service plan and individual apps. CLI output can reveal the plan PerSiteScaling flag, worker count, SKU, app limits, tags, and resource IDs in one repeatable review instead of relying on portal navigation.
CLI use cases
Create or update an App Service plan with per-site scaling enabled for high-density hosting.
Inventory plans and apps to compare perSiteScaling, worker count, SKU, and ownership tags.
Adjust plan worker count after validating that app-specific worker limits match expected traffic.
Export plan and web app settings before a capacity change or FinOps density review.
Before you run CLI
Confirm tenant, subscription, resource group, plan name, app names, region, and SKU support before changing scale settings.
Check permissions because plan updates are control-plane changes that affect every app sharing the plan.
Review cost risk, autoscale rules, health checks, deployment slots, and traffic patterns before raising worker counts.
Use JSON output when comparing settings and avoid changing production plans without documented rollback values.
What output tells you
The perSiteScaling value shows whether apps can use different worker counts inside the same plan.
Plan SKU, location, and numberOfWorkers explain the total capacity pool available to hosted apps.
Web app worker-related fields and IDs help identify which app is constrained or overallocated.
Tags and resource groups show ownership, cost center, and whether the plan matches approved density policy.
Mapped Azure CLI commands
App Service per-site scaling commands
direct
az appservice plan create --name <plan-name> --resource-group <resource-group> --location <region> --sku P1V3 --number-of-workers 3 --per-site-scaling true
az appservice planprovisionApp platform
az appservice plan update --name <plan-name> --resource-group <resource-group> --per-site-scaling true --output json
az appservice planconfigureApp platform
az appservice plan show --name <plan-name> --resource-group <resource-group> --query "{sku:sku,workers:numberOfWorkers,perSiteScaling:perSiteScaling}"
az appservice plandiscoverApp platform
az webapp update --name <app-name> --resource-group <resource-group> --set properties.numberOfWorkers=<count>
az webappconfigureApp platform
Architecture context
In Azure architecture, per-site scaling is an App Service Plan control-plane behavior that lets individual apps choose how many workers from a shared plan they can consume. It matters when one plan hosts several sites with uneven traffic or different isolation expectations. The plan still defines SKU, total worker pool, zone redundancy eligibility, and cost boundary, while each app’s worker count affects placement, capacity, and noisy-neighbor risk. I usually treat it as a capacity-partitioning tool, not a substitute for separate plans. It should be documented with deployment slots, Always On, health checks, autoscale rules, and diagnostic settings so operators know whether a slow app needs more workers, plan scale-out, or isolation into another App Service Plan.
Security
Security impact is indirect because per-site scaling does not change authentication, authorization, TLS, managed identity, or network access by itself. Risk appears through shared hosting. Apps in the same plan share worker infrastructure boundaries, so an overloaded or poorly governed plan can create operational exposure for unrelated apps. Sensitive apps may need a dedicated plan instead of density optimization. Operators should verify who can change plan and app scale settings, enforce least-privilege roles, separate production from nonproduction plans, protect slot settings, and document which workloads are allowed to share compute. Document tenant, data-sensitivity, and deployment ownership before consolidation or ownership transfer.
Cost
Cost impact is direct because App Service plan workers are the billed compute unit. Per-site scaling can improve density by letting small apps share a larger plan without consuming every worker. It may delay the need for extra plans, reduce idle capacity, and make chargeback clearer when one app needs more workers. It can also create hidden cost if teams scale the plan high but forget to right-size app worker limits. FinOps reviews should compare plan SKU, worker count, app allocation, utilization, tags, and ownership before deciding whether density or isolation is cheaper. Review chargeback or showback so shared capacity has accountable owners.
Reliability
Reliability impact is direct because worker allocation affects whether an app has enough instances during load, maintenance, or failures. Setting an app too low can cause high latency or cold capacity during bursts, while setting every app high defeats the purpose and overloads the plan. Reliable use requires testing each app’s worker count, monitoring per-app health, and confirming that plan-level scale-out still has headroom. Operators should pair per-site scaling with deployment slots, health checks, autoscale strategy, and rollback plans. Critical apps may still need a dedicated App Service plan to reduce blast radius. Alerting should be app-specific, not only plan-wide during production incidents.
Performance
Performance impact is direct for apps whose worker count changes. More workers can improve concurrency and reduce queueing for a busy app, while fewer workers can conserve capacity for quieter apps. The setting does not improve CPU speed, memory per worker, startup time, database performance, or code efficiency. Operators should measure response time, CPU, memory, HTTP queue length, cold starts, and instance distribution after changes. A per-site scaling adjustment should be tested with realistic traffic because an app limited to too few workers can bottleneck even when the plan itself has spare instances. Benchmark with realistic concurrency before treating the allocation as safe.
Operations
Operators manage per-site scaling by inventorying App Service plans, checking the PerSiteScaling flag, reviewing each app’s worker limits, and comparing those settings with live metrics. Azure CLI is useful because portal views can hide the relationship between plan capacity and app-specific worker allocation. Runbooks should state which apps share the plan, which app owns capacity spikes, who can raise worker counts, and how changes are validated. Incident responders should inspect both plan-level worker count and app-level worker settings before blaming code, networking, or downstream dependencies. Scheduled reviews should check stale apps, retired sites, owner tags, undocumented limits, and noisy neighbors.
Common mistakes
Enabling per-site scaling but forgetting to review app-level worker limits for each hosted app.
Expecting per-site scaling to behave like autoscale rules or automatic scaling under changing traffic.
Putting critical and experimental apps in one dense plan without monitoring or ownership boundaries.
Scaling the plan to more workers and missing that one app is still capped at too few workers.