Web Web App metric verified

Requests in application queue

Requests in application queue is a metric that shows requests waiting at the application layer for an App Service web app or function app. It is a clue that the app is receiving work faster than it is handling work. The reason might be slow code, too few workers, cold starts, blocked threads, long downstream calls, or scale-out lag. The metric does not fix anything by itself, but it tells operators that users may already be waiting before a request reaches useful application processing.

Back to glossary browser Open Microsoft Learn source

Aliases: RequestsInApplicationQueue, App Service application queue, HTTP requests in application queue, Web App request queue metric, Function App request queue metric
Difficulty: intermediate
CLI mappings: 6
Last verified: 2026-05-22T04:15:00Z

Microsoft Learn

Requests In Application Queue is an Azure Monitor metric for Microsoft.Web/sites that reports how many Web App or Function App requests are waiting in the application request queue. It helps diagnose saturation, slow handlers, cold starts, thread exhaustion, or delayed scale-out.

Microsoft Learn: Supported metrics for Microsoft.Web/sites2026-05-22T04:15:00Z

Technical context

In Azure architecture, Requests In Application Queue belongs to Azure Monitor metrics for Microsoft.Web/sites. It is observed per Web App or Function App and can be split by instance where supported. The metric connects app runtime health with App Service plan capacity, worker count, scale rules, Always On, cold start behavior, downstream dependency latency, and application telemetry. Operators often review it beside HTTP response times, HTTP 5xx, CPU, memory, thread count, App Service plan HTTP queue length, Application Insights requests, and autoscale activity.

Why it matters

Requests in application queue matters because it is an early sign that the application layer is falling behind. CPU can look acceptable while requests are still piling up behind slow database calls, thread starvation, synchronous I/O, or cold-start behavior. Users experience that as spinning pages, timeouts, duplicate submissions, and support tickets. The metric helps teams decide whether to scale out, scale up, tune code, warm instances, fix dependencies, or adjust autoscale rules. It is especially useful during launches and incidents because it separates incoming demand from actual processing. A rising queue means time is being spent waiting, not serving.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Azure Monitor metrics for a Web App or Function App, Requests In Application Queue appears with metric name RequestsInApplicationQueue, aggregation options, and instance dimension.

Signal 02

In Azure CLI metric output, time series values show average queued requests over the selected interval, helping compare launch, incident, deployment, and baseline windows quickly.

Signal 03

In autoscale or alert reviews, this metric appears beside CPU, memory, HTTP errors, response time, worker count, instance health, and App Service plan scale actions.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Detect app-layer saturation during a launch when CPU alone does not explain slow pages or timeouts.
Tune autoscale rules by comparing request queue growth with instance count changes and scale-out delay.
Separate platform capacity problems from slow downstream dependencies by correlating queue length with dependency duration.
Identify cold-start or deployment-warm-up issues when queue spikes appear immediately after slot swaps or restarts.
Create alerts that warn operators before queued requests become customer-visible timeout storms.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Ticketing platform stops launch-day timeout spiral

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

SeatRiver hosted high-demand event sales on Azure App Service. During a stadium presale, users saw spinning checkout pages while CPU stayed below the team’s alert threshold.

Business/Technical Objectives

Detect application-layer backlog before checkout timeouts spread.
Decide whether to scale the App Service plan or tune checkout code.
Alert the incident team on queue growth instead of waiting for complaints.
Reduce duplicate purchase attempts caused by user retries.

Solution Using Requests in application queue

SREs pulled RequestsInApplicationQueue with Azure CLI for the incident window and compared it with Application Insights request duration and payment gateway dependency calls. The queue rose sharply only on checkout routes, while CPU remained moderate. Engineers found synchronous payment-status polling holding request threads. The team added an alert on the application queue, changed polling to an asynchronous status check, and adjusted autoscale to add workers earlier during presales. The App Service plan was scaled temporarily for the next event, but the code fix removed the sustained backlog. Engineers also rehearsed a rollback path, payment throttling rule, and operations bridge with payment owners before the second sales wave reopened.

Results & Business Impact

Checkout timeout rate dropped from 9.8 percent to 1.1 percent during the next presale.
Duplicate payment attempts fell by 54 percent because users were no longer stuck behind queued requests.
Queue alerts fired seven minutes before the first support spike, giving responders time to scale safely.
The team avoided a permanent plan upgrade after code changes reduced thread blocking.

Key Takeaway for Glossary Readers

Requests in application queue can reveal app-layer saturation even when traditional CPU alerts look calm.

Case study 02

University exam portal survives morning login surge

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Westbridge Online ran exam registration on App Service. At 8:00 a.m., students logged in together, and the app queue spiked after every deployment restart.

Business/Technical Objectives

Keep registration pages responsive during scheduled student surges.
Reduce post-deployment queue spikes before exam windows opened.
Tune autoscale rules using a metric tied to user waiting.
Prove whether cold start or steady-state capacity caused delays.

Solution Using Requests in application queue

The platform team reviewed RequestsInApplicationQueue by instance and found spikes immediately after slot swaps and worker restarts. CPU rose later, after students were already waiting. Engineers enabled Always On, added warm-up checks to deployment slots, and adjusted autoscale rules to include the queue metric alongside CPU. Azure CLI exported metric time series for the registrar’s readiness review, and deployment pipelines paused registration releases during the first hour of exam windows. Application Insights confirmed that downstream identity calls were normal, so the problem was app warm-up and capacity timing. They kept a decision log showing why adding more instances was no longer the default action.

Results & Business Impact

Post-deployment queue spikes fell from 420 queued requests to fewer than 45.
Median registration-page response time during the morning surge improved from 3.6 seconds to 780 milliseconds.
Autoscale added workers 11 minutes earlier than the prior CPU-only rule.
Student support tickets about stuck registration pages dropped by 73 percent.

Key Takeaway for Glossary Readers

Queue metrics help teams catch the waiting users feel before CPU or generic availability checks tell the full story.

Case study 03

Food delivery admin app finds slow dependency behind backlog

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

DashFork used an App Service admin portal for restaurant onboarding. Support agents blamed Azure capacity when queue metrics climbed during lunch hours, but the plan still had headroom.

Business/Technical Objectives

Identify why requests queued while CPU and memory looked acceptable.
Avoid scaling the App Service plan without proof of platform saturation.
Reduce agent wait time when opening restaurant account pages.
Create an incident checklist for queue-driven investigations.

Solution Using Requests in application queue

Operators pulled RequestsInApplicationQueue, CPU, memory, and response time from Azure Monitor with CLI, then compared the data with Application Insights dependency telemetry. Queue growth matched slow calls to a third-party tax-validation API, not worker exhaustion. Developers added shorter timeouts, caching for repeated tax-zone lookups, and a fallback review state for accounts requiring manual validation. The team kept the existing App Service plan size but added an alert combining queue growth with dependency duration. Runbooks were updated to check route, instance, dependency, and scale activity before requesting more capacity. They also paused nonessential proctoring calls, notified exam coordinators, and watched instance-level queue metrics before reopening exam intake for the next cohort.

Results & Business Impact

Average account-page load time fell from 5.2 seconds to 1.4 seconds during lunch-hour onboarding.
The team avoided a proposed plan upgrade that would have increased monthly platform cost by 27 percent.
Queue-related support escalations dropped from 31 per week to six.
Incident responders could identify dependency-driven backlog within ten minutes using the new checklist.

Key Takeaway for Glossary Readers

A rising application queue is a symptom; correlating it with dependency telemetry prevents expensive and ineffective scaling.

Why use Azure CLI for this?

After ten years of Azure engineering work, I use CLI for this metric because queue problems happen during pressure, not during calm portal browsing. CLI can pull a time range, filter by instance, compare with CPU and response-time metrics, and feed incident notes quickly. It also lets teams script alerts, export evidence after an outage, and compare production with staging during load tests. There is no value in clicking around while users are waiting. CLI turns queue suspicion into repeatable data that an incident channel can act on, keeps capacity discussions grounded, and supports the same checks across every app environment.

CLI use cases

Pull RequestsInApplicationQueue for one app over the incident window and export the time series as JSON.
List metric definitions to confirm the exact metric name and supported dimensions before creating alerts.
Create a metric alert that fires when the application queue exceeds the team’s tested threshold.
Show App Service plan SKU and worker count when deciding whether scale-out or code tuning is more likely.
Compare queue metrics with CPU, memory, and response time for multiple apps sharing one App Service plan.

Before you run CLI

Confirm tenant, subscription, resource group, web app or function app name, slot, resource ID, region, and App Service plan.
Check permissions for Azure Monitor metrics, web app read access, alert rule creation, and App Service plan inspection.
Set the correct time window, interval, aggregation, and output format; queue signals can disappear if the interval is too coarse.
Understand cost and customer impact before scaling the plan, because more workers may not fix slow code or downstream dependencies.

What output tells you

Metric values show whether requests were waiting at the application layer during the selected time window and on which instances.
Aggregation and interval reveal whether the queue spike was sustained, brief, hidden by averaging, or isolated to one worker.
Plan SKU and worker count output show the capacity envelope available before a scale-up or scale-out decision.
Alert output confirms threshold, evaluation frequency, action group, and whether operators will be notified before queue pressure escalates.

Mapped Azure CLI commands

App Service application queue monitoring

direct

az monitor metrics list --resource <webapp-resource-id> --metric RequestsInApplicationQueue --interval PT1M --aggregation Average --output json

az monitor metricsdiscoverWeb

az monitor metrics list-definitions --resource <webapp-resource-id> --output table

az monitor metricsdiscoverWeb

az monitor metrics alert create --name <alert-name> --resource-group <resource-group> --scopes <webapp-resource-id> --condition "avg RequestsInApplicationQueue > <threshold>" --window-size 5m --evaluation-frequency 1m --action <action-group-id>

az monitor metrics alertprovisionWeb

az webapp show --resource-group <resource-group> --name <webapp-name> --query "{state:state,serverFarmId:serverFarmId,hostNames:hostNames}" --output json

az webappdiscoverWeb

az appservice plan show --resource-group <resource-group> --name <plan-name> --query "{sku:sku,workerCount:numberOfWorkers,maximumNumberOfWorkers:maximumNumberOfWorkers}" --output json

az appservice plandiscoverWeb

az webapp config show --resource-group <resource-group> --name <webapp-name> --query "{alwaysOn:alwaysOn,linuxFxVersion:linuxFxVersion,numberOfWorkers:numberOfWorkers}" --output json

az webapp configdiscoverWeb

Architecture context

Architecturally, Requests In Application Queue sits at the App Service worker boundary. It is not a business metric and it does not identify the slow method by itself. It tells you requests are waiting before the application can process them. A seasoned Azure architect uses it with CPU, memory, HTTP 5xx, response time, dependency latency, thread-pool behavior, deployment restarts, and autoscale activity. It is especially valuable for web apps that look healthy at the infrastructure level while users experience slow pages. For critical apps, queue alerts should be tested during load drills and autoscale rules should reflect this demand signal before a public launch.

Security

Security impact is indirect. The metric does not expose a secret or change access, but queue pressure can create security and compliance side effects. During overload, teams may disable authentication checks, extend timeouts, bypass WAF rules, or log excessive request details while troubleshooting. Attackers can also cause queue growth through abusive traffic, expensive requests, or bot activity. Operators should review access restrictions, WAF and Front Door signals, authentication failures, and request patterns when the queue rises unexpectedly. Metric data should be scoped through Azure Monitor roles, and diagnostic logs should avoid exposing personal data while still supporting investigation. during incidents. Stable overload handling keeps protective controls from being weakened. Review these changes after stabilization.

Cost

Cost impact is indirect but important. A rising queue often pushes teams to scale out or move to a larger App Service plan, which can be the right answer or an expensive distraction. If the queue is caused by slow database queries, blocked threads, or downstream API latency, more workers may increase cost without solving the root cause. Conversely, under-scaling a customer-facing app creates revenue loss, support load, and incident labor. FinOps reviews should compare queue metrics with plan SKU, worker count, autoscale history, and application fixes. The best cost outcome is targeted: tune code where possible and buy capacity where it truly removes waiting.

Reliability

Reliability impact is direct because a sustained application queue means the service cannot keep up with current demand. If the queue continues rising, users see latency, timeouts, failed health probes, retries, and sometimes cascading downstream load. Reliable designs alert before the queue becomes an outage, scale out with enough headroom, and include graceful degradation for expensive features. Deployment slots, warm-up, Always On, and health checks help reduce queue spikes after releases. Operators should correlate queue growth with scale events, restarts, dependency failures, and deployment timestamps. A queue spike after a release is a change-safety signal, not just a capacity graph.

Performance

Performance impact is direct because queued requests are already delayed before useful work completes. A low or zero queue does not guarantee a fast app, but a rising queue is a strong signal of constrained concurrency or slow processing. Performance tuning should compare this metric with response-time percentiles, thread count, CPU, memory, dependency duration, and instance count. Queue buildup during cold start may suggest Always On, pre-warmed instances, or deployment-slot warm-up. Queue buildup during steady traffic may require scale-out, async code, connection-pool fixes, or dependency tuning. The goal is not only lower averages, but fewer users waiting behind stuck requests.

Operations

Operators inspect this metric during incidents, load tests, launch windows, and autoscale reviews. They pull queue length over time, split by instance when possible, and compare it with request duration, HTTP errors, CPU, memory, thread count, dependency latency, and scale-out activity. Azure CLI supports metric collection, alert creation, App Service plan inspection, web app configuration review, and evidence export. Practical runbooks include checking whether all instances are healthy, whether a deployment just restarted workers, whether downstream dependencies slowed, and whether autoscale rules fired soon enough. The metric should be documented with the app team’s expected normal range. They also preserve metric exports for postmortems and capacity planning. Historical baselines make those comparisons meaningful during pressure. That baseline prevents guesswork.

Common mistakes

Treating a zero queue as proof the app is healthy while response times or dependency failures are still high.
Scaling out immediately without checking whether slow database calls, thread exhaustion, or external APIs are causing the backlog.
Creating alerts with long aggregation windows that smooth out short queue spikes users actually feel.
Comparing app-level queue metrics with plan-level HTTP queue length without understanding they are different signals.

Operator quick checks

Pull the metric for the exact incident window and compare it with response-time percentiles and HTTP errors.
Check whether queue growth aligns with deployment, restart, cold start, or scale-out timestamps.
Look at per-instance dimensions before assuming every worker is equally overloaded.
Verify downstream dependency latency before buying more App Service capacity.

Questions to ask

Is the queue rising because traffic increased, workers restarted, code slowed, or dependencies stopped responding?
Which users, routes, or instances are affected when the queue grows?
What threshold was load-tested, and does the alert fire before customer timeouts?
What rollback or scale action is safe if queue growth starts after a deployment?
Which metric proves the chosen fix reduced waiting instead of only hiding symptoms?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph