Monitoring and Observability Availability premium

Availability test

Availability test is an outside-in monitor that repeatedly calls a public HTTP or HTTPS endpoint and records whether it responded quickly enough. In Azure, teams encounter it when product teams need synthetic checks for websites, APIs, or critical external dependencies before users report an outage. The useful question is what behavior it proves, who owns it, and what should happen when the signal changes. Good operators tie availability test to service limits, monitoring, access controls, and rollback steps so decisions stay visible during reviews, incidents, and planning.

Back to glossary browser Open Microsoft Learn source

Aliases: Application Insights availability test, web test, standard test, synthetic test
Difficulty: intermediate
CLI mappings: 4
Last verified: 2026-05-11T00:00:00Z

Microsoft Learn

Availability test is an outside-in monitor that repeatedly calls a public HTTP or HTTPS endpoint and records whether it responded quickly enough. Microsoft Learn places it in Application Insights availability tests - Azure Monitor; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Application Insights availability tests - Azure Monitor2026-05-11T00:00:00Z

Technical context

Technically, Availability test depends on Application Insights web tests, public endpoint reachability, test frequency, probe locations, success criteria, and Azure Monitor alerting. Azure exposes it through availability test definitions, AppAvailabilityResults rows, metric charts, workbooks, and alert rules. The important settings or fields are test URL, expected response, frequency, timeout, enabled state, locations, SSL checks, tags, and linked component. Architects should verify whether the probe path represents a meaningful user journey and not just a shallow homepage response, because wrong assumptions can hide failures, inflate cost, or leave a production change unsupported.

Why it matters

Availability test matters because synthetic monitoring finds customer-visible reachability or latency problems even when application servers are still running. It gives teams a shared reference for deciding whether the service is healthy, correctly configured, and ready for production scale. When it is misunderstood, engineers often chase the wrong symptom: trusting infrastructure metrics while the public route, certificate, dependency, or login path is already broken. When it is governed well, owners can explain the control, measure business impact, and act before customers notice. That clarity helps reviewers connect cloud settings to uptime, compliance, release quality, and support cost. Document the owner, region, change window, and rollback step before production use.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

You see availability tests in Application Insights, where each test defines a URL, probe frequency, locations, timeout, and success criteria. during reviews. during operational reviews.

Signal 02

They appear in Azure Monitor alerts when repeated failed probes or slow responses cross the threshold configured for an endpoint. during reviews. during operational reviews.

Signal 03

They show up in release checklists when teams validate external routes, certificates, dependencies, and health pages before customer traffic changes. during reviews. during operational reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Monitor public websites and REST APIs from multiple locations.
Alert support teams when a business-critical endpoint stops responding.
Compare release health before and after a traffic cutover.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Availability test protects checkout

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

VerdeCart Market, a online grocery retailer, needed to solve checkout outages being discovered by customers before support teams while protecting customer experience and audit commitments. The platform team had a narrow change window and no tolerance for vague ownership.

Business/Technical Objectives

Detect checkout endpoint failures within five minutes.
Monitor from regions matching customer concentration.
Alert only when repeated probes show real risk.
Track response-time percentiles during weekly promotions.

Solution Using Availability test

The architecture team used Availability test as the practical control point. Engineers created Application Insights availability tests for checkout, inventory lookup, and payment authorization paths. Tests ran from selected locations, alerts required repeated failures, and AppAvailabilityResults fed a workbook used during promotion launches. Maintenance suppression was documented for planned payment-gateway changes. They integrated the configuration with Azure Monitor dashboards, deployment notes, and role-based access review so support engineers could see the same evidence as architects. CLI checks were added to the release runbook to confirm the resource scope, current settings, and recent health signals before any production change. The design also included rollback criteria, escalation contacts, and a weekly review of exceptions so the term stayed connected to measurable operations instead of becoming tribal knowledge.

Results & Business Impact

Mean time to detect checkout issues fell from 31 minutes to 4 minutes.
False paging dropped by 38 percent after threshold tuning.
Promotion-day p95 endpoint duration stayed under 900 milliseconds.
Support teams had endpoint-specific evidence before customer calls spiked.

Key Takeaway for Glossary Readers

Availability tests catch public-path failures that infrastructure dashboards can miss.

Case study 02

Availability test validates telehealth access

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Northstar Telehealth, a virtual care provider, needed to solve patients sometimes unable to reach video-session setup pages while protecting customer experience and audit commitments. The platform team had a narrow change window and no tolerance for vague ownership.

Business/Technical Objectives

Probe the login and session-preparation path continuously.
Alert clinical support before appointments are affected.
Track certificate and timeout failures separately.
Provide release evidence for patient-safety review.

Solution Using Availability test

The architecture team used Availability test as the practical control point. The platform team configured standard availability tests against the HTTPS session setup endpoint, including timeout and certificate checks. Alerts routed to clinical technology support, while deployment reviews compared pre-release and post-release probe duration from the same locations. They integrated the configuration with Azure Monitor dashboards, deployment notes, and role-based access review so support engineers could see the same evidence as architects. CLI checks were added to the release runbook to confirm the resource scope, current settings, and recent health signals before any production change. The design also included rollback criteria, escalation contacts, and a weekly review of exceptions so the term stayed connected to measurable operations instead of becoming tribal knowledge.

Results & Business Impact

Session setup incidents were detected 18 minutes earlier on average.
Certificate misconfiguration was caught before business hours.
Patient support tickets related to access fell by 22 percent.
Release reviews gained a repeatable synthetic-health checkpoint.

Key Takeaway for Glossary Readers

A well-chosen availability test represents a real user journey, not just server uptime.

Case study 03

Availability test monitors permit APIs

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

RiverGate Permits, a municipal permitting platform, needed to solve third-party inspection APIs failing without clear customer-visible evidence while protecting customer experience and audit commitments. The platform team had a narrow change window and no tolerance for vague ownership.

Business/Technical Objectives

Monitor both the portal endpoint and external dependency path.
Separate internal application failures from partner API failures.
Create a shared incident runbook for inspectors.
Keep availability reporting aligned to service commitments.

Solution Using Availability test

The architecture team used Availability test as the practical control point. Operations created two availability tests: one for the public portal and one for the inspection dependency route. Workbooks summarized failures by test name and location, and the incident runbook used those results to route issues either to internal developers or the partner support channel. They integrated the configuration with Azure Monitor dashboards, deployment notes, and role-based access review so support engineers could see the same evidence as architects. CLI checks were added to the release runbook to confirm the resource scope, current settings, and recent health signals before any production change. The design also included rollback criteria, escalation contacts, and a weekly review of exceptions so the term stayed connected to measurable operations instead of becoming tribal knowledge.

Results & Business Impact

Partner-caused incidents were identified 44 percent faster.
Internal developers avoided three unnecessary emergency rollbacks.
Inspector-facing availability reports became consistent across departments.
The runbook cut after-hours confusion during two regional outages.

Key Takeaway for Glossary Readers

Availability tests are strongest when each probe maps to a clear owner and response path.

Why use Azure CLI for this?

CLI and KQL make availability tests inspectable during incidents, showing which endpoint, location, and threshold produced the alert.

CLI use cases

List availability tests for an Application Insights component.
Show the configuration of one test before editing alerts.
Query recent AppAvailabilityResults for failures and latency.
Review availability metrics during a release or incident.

Before you run CLI

Confirm the endpoint is publicly reachable from Azure monitoring locations.
Choose a URL that represents a meaningful customer path.
Decide whether authentication, certificates, or dependencies need explicit checks.
Coordinate maintenance windows so planned changes do not create false incidents.

What output tells you

Web test output shows URL, enabled state, frequency, timeout, and locations.
Log query output shows failed probes, duration, and failing locations.
Metric output shows availability percentage and response-time trends.
Alert output links probe failures to the configured incident workflow.

Mapped Azure CLI commands

Operational CLI checks

direct

az monitor app-insights web-test list --resource-group <resource-group> --component-name <component-name>

az monitor app-insights web-testdiscoverMonitoring and Observability

az monitor app-insights web-test show --resource-group <resource-group> --name <test-name>

az monitor app-insights web-testdiscoverMonitoring and Observability

az monitor log-analytics query --workspace <workspace-id> --analytics-query "AppAvailabilityResults | where TimeGenerated > ago(1h) | summarize availability=100.0*countif(Success)/count(), p95=percentile(DurationMs,95) by Name"

az monitor log-analyticsdiscoverMonitoring and Observability

az monitor metrics list --resource <application-insights-resource-id> --metric availabilityResults/availabilityPercentage

az monitor metricsdiscoverMonitoring and Observability

Architecture context

Security

Security for Availability test starts with knowing which identities, data paths, and administrators can influence it. The main risk is testing sensitive endpoints with exposed secrets, weak test identities, or payloads that appear in logs and alerts. Use least privilege, managed identities where available, private networking when required, logging, and change approval for production settings. Review test URLs, authentication approach, alert payloads, certificate checks, firewall rules, and who can edit web tests before granting access or accepting a recommendation. Security teams should also confirm that alerts, audit trails, and exception records explain who changed the configuration, why it changed, and what evidence proves the change stayed inside policy.

Cost

Cost impact for Availability test comes from the resources, telemetry, storage, compute, and engineering time connected to it. The most common waste pattern is creating many shallow or duplicate tests that generate alert noise and telemetry without improving response. Estimate the billable resources before enabling features, and compare the expense with the business risk being reduced. Track test count, log ingestion, alert volume, incident time, and whether each test maps to an SLO so optimization work does not quietly damage reliability or security. For production, pair cost reviews with ownership, budgets, Advisor signals where relevant, and a policy for retiring unused capacity or stale monitoring data.

Reliability

Reliability depends on whether Availability test is designed for the failure modes the workload actually faces. For this term, the common reliability question is whether the test catches the real failure path early enough to page the right support team. Set measurable thresholds, test during planned change, and make sure incidents have a clear owner and escalation path. Watch availability percentage, failed locations, timeout counts, SSL failures, and correlated request or dependency telemetry so teams can distinguish platform behavior from application defects. A reliable design also includes rollback, regional assumptions, dependency health, and documented limits instead of hoping the default setting will cover every outage.

Performance

Performance depends on how Availability test affects latency, throughput, concurrency, or decision speed in the surrounding workload. The performance risk is a test that only checks success while slow responses quietly violate the user experience target. Measure before and after changes using representative traffic, not only averages from a quiet period. Tune frequency, timeout, selected locations, expected response checks, and percentile thresholds while watching error rates, saturation, and customer-facing response time. Performance work should include capacity limits, regional placement, retry behavior, and clear evidence that the optimized path still meets security and reliability requirements. Document the owner, region, change window, and rollback step before production use.

Operations

Operationally, Availability test should appear in runbooks, dashboards, and release checks rather than living only in a portal page. Operators should review test enablement, location coverage, alert thresholds, maintenance suppression, and ownership for every monitored endpoint on a scheduled cadence and after major incidents. Use tags, resource inventory, activity logs, Azure Monitor, and CLI queries to keep the setting or signal discoverable. During handoffs, explain which endpoint is checked, what failure threshold pages the team, and what runbook follows the alert so the next engineer can make a safe decision quickly. Good operations turn the term into a repeatable checklist item with an owner, evidence, and a known path for remediation.

Common mistakes

Testing only the homepage while the critical API path fails.
Creating tests without alert ownership or a runbook.
Using too few locations for a globally used service.
Ignoring certificate or latency signals because the endpoint returns HTTP 200.

Operator quick checks

Confirm each business-critical endpoint has a meaningful synthetic test.
Summarize failures by location after every availability alert.
Review test frequency and timeout against the service SLO.
Disable or suppress tests only with documented maintenance approval.

Questions to ask

Which user journey does this availability test represent?
What failure count or latency threshold should page the team?
Which locations matter for the customer base?
Who updates the test when the endpoint or authentication path changes?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph