AI and Machine Learning Azure Machine Learning verified

Online endpoint

An online endpoint is the web address and management boundary that clients use to get real-time predictions from Azure Machine Learning. Applications send a request to the endpoint, and Azure routes it to one or more deployments behind that endpoint. The endpoint handles the stable name, scoring URI, authentication mode, and traffic split, while deployments provide the actual model-serving backends. This separation lets teams update models while keeping client integrations steady and observable. That makes endpoint governance practical during releases and incidents.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure ML online endpoint, managed online endpoint, real-time inference endpoint, scoring URI, online endpoint
Difficulty: fundamentals
CLI mappings: 6
Last verified: 2026-05-17

Microsoft Learn

An online endpoint in Azure Machine Learning is an HTTPS endpoint for real-time inference. It exposes deployed models to synchronous clients, defines endpoint name, region, authentication mode, scoring URI, and traffic allocation, and can host one or more online deployments.

Microsoft Learn: Online endpoints for real-time inference2026-05-17

Technical context

Technically, an online endpoint is an Azure Machine Learning workspace resource for synchronous inference over HTTP. Managed online endpoints provide fully managed serving infrastructure, while Kubernetes online endpoints use attached Kubernetes compute. The endpoint contains authentication settings, scoring URI, identity, network isolation options, traffic rules, and deployment relationships. It integrates with Azure Monitor, Log Analytics, Application Insights, private endpoints, managed virtual networks, key or token authentication, Azure RBAC, model deployments, quotas, and endpoint-level cost analysis. Naming is region-scoped and must be planned before production clients depend on it.

Why it matters

Online endpoint matters because it is the contract between an application and a machine learning service. If the endpoint name, authentication mode, scoring URI, or traffic routing changes unexpectedly, every client that calls the model can fail. A well-designed endpoint gives teams a stable integration point while allowing deployments behind it to change safely. It also becomes the place where operators monitor inference latency, failures, availability, and cost. For architects, the endpoint defines the boundary for network isolation and client authentication. For learners, it clarifies why model deployment is not only a data science task but also an application platform responsibility. It also gives support teams a single place to start when real-time prediction calls fail.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Azure Machine Learning studio, online endpoints show endpoint name, scoring URI, authentication mode, deployments, traffic split, test tab, metrics, logs, and operational review context.

Signal 02

In az ml online-endpoint output, operators see provisioning state, location, identity, auth mode, scoring URI, mirror traffic, deployment traffic rules, automation context, and rollout evidence.

Signal 03

In client application configuration, the endpoint appears as the scoring URL, credential reference, request schema, timeout setting, and environment-specific inference target for deployment troubleshooting reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Expose a real-time classification model to an internal application through a stable HTTPS scoring URI.
Route traffic between blue and green deployments while keeping the endpoint URL unchanged.
Secure inbound scoring through private endpoint access for regulated inference workloads.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Private inference endpoint for a tax filing platform

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

LedgerSpring processed uploaded tax forms with a model hosted in Azure Machine Learning. Security reviewers rejected the original public scoring path because requests contained taxpayer identifiers.

Business/Technical Objectives

Move inference traffic onto a private network path.
Keep the application-facing endpoint stable during model updates.
Restrict credential retrieval to approved platform identities.
Monitor latency during the busiest filing weeks.

Solution Using Online endpoint

The platform team created a managed online endpoint in the production workspace and configured inbound access through the workspace private endpoint. Application services reached the scoring URI from an approved virtual network. Endpoint keys were stored in Key Vault, and a custom role limited who could list credentials. Two deployments sat behind the endpoint so model changes could be released through traffic splits. Azure Monitor dashboards tracked request count, latency, failures, and deployment traffic while Application Insights correlated calls with application workflows. Product owners also sampled predictions from common filing scenarios before traffic moved beyond the initial pilot group. The runbook also verified DNS, private endpoint approval, and client subnet routing before each filing-season deployment change.

Results & Business Impact

All tax-form inference calls moved off the public internet path.
Credential retrieval permissions were reduced to three managed identities.
Median scoring latency stayed below 220 milliseconds during peak filing week.
Model updates required no client URL change because the endpoint remained stable.

Key Takeaway for Glossary Readers

An online endpoint is the security and integration boundary for real-time inference, not just a convenient model URL.

Case study 02

Traffic-controlled recommendations for a streaming app

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

StreamNest delivered personalized content recommendations from a single Azure Machine Learning online endpoint. A new ranking model improved engagement in tests but increased compute use per request.

Business/Technical Objectives

Roll out the new recommendation backend gradually.
Keep mobile and web clients on the same scoring URI.
Watch engagement, latency, and cost at each traffic level.
Rollback quickly if recommendation quality or response time worsened.

Solution Using Online endpoint

Engineers added a green deployment under the existing endpoint and kept blue as the default. Azure CLI updated endpoint traffic from 5% to 25%, then 60%, and finally 100% after dashboards passed review. Product analytics measured watch-through rate, while platform metrics tracked latency and CPU utilization. The endpoint configuration was stored as YAML so traffic changes were reviewable. A rollback command set blue back to 100% if failure rates, p95 latency, or engagement metrics crossed guardrails. Security reviewers tested access from an unapproved network to confirm private controls blocked recommendation-scoring attempts before production approval. The launch checklist also captured endpoint owner, rollback command, and approval contact before each traffic increase.

Results & Business Impact

Watch-through rate improved by 6% after the final rollout.
p95 endpoint latency remained below the 450 millisecond service objective.
No client deployment was needed because the endpoint URI did not change.
One early traffic step was rolled back within five minutes after a dependency alert fired.

Key Takeaway for Glossary Readers

Online endpoints give product teams a stable interface while operators manage model traffic with measurable guardrails.

Case study 03

Standard endpoint runbooks for municipal services

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

CivicWorks Department used Azure Machine Learning endpoints for permit triage, call routing, and service-ticket classification. Each team had different naming, credential, and monitoring habits.

Business/Technical Objectives

Create consistent endpoint naming and ownership records.
Reduce incident time for failed real-time inference calls.
Separate credential access from model deployment permissions.
Add comparable metrics across all citizen-service endpoints.

Solution Using Online endpoint

The central platform group wrote endpoint templates that defined naming, tags, auth mode, alert rules, and required deployment labels. Azure CLI runbooks listed endpoints, invoked sample requests, retrieved nonsecret configuration, and checked traffic assignments. Custom roles separated operators who could change traffic from developers who could update deployments. Every endpoint had a dashboard showing request volume, p50, p95, failures, and active deployment. The group also created a cleanup review for endpoints with no requests in thirty days. Researchers had one week to object before deletion, and archived metadata preserved model references for published experiments. Monthly governance reports compared endpoint inventory with active application registrations, budget owners, and support escalation paths before retirement decisions.

Results & Business Impact

Mean time to identify endpoint versus deployment failures dropped from fifty minutes to eighteen minutes.
All production endpoints gained owner, application, and data-sensitivity tags.
Credential-access assignments were reduced by 40% after role separation.
Three unused endpoints were retired, removing idle deployments, stale alerts, and monitoring noise.

Key Takeaway for Glossary Readers

Online endpoint operations improve when endpoint ownership, credential access, metrics, and deployment routing are standardized together.

Why use Azure CLI for this?

Azure CLI is useful for online endpoints because endpoint configuration is part of production application integration. The ml extension lets operators create, show, list, invoke, update, and delete endpoints, retrieve credentials, and change traffic rules without clicking through studio. CLI output also supports runbooks, incident evidence, environment comparison, and source-controlled endpoint definitions.

CLI use cases

Create a managed online endpoint from YAML with the intended name, authentication mode, and workspace scope.
Show endpoint scoring URI, provisioning state, identity, auth mode, and traffic allocation before release.
Invoke an endpoint with a sample request file to validate authentication, schema, routing, and response shape.
Update traffic rules to move requests between blue and green deployments during rollout or rollback.

Before you run CLI

Confirm tenant, subscription, resource group, workspace, endpoint name, region, authentication mode, and whether clients already depend on the scoring URI.
Check Azure Machine Learning permissions, provider registration, private endpoint requirements, managed network settings, deployment health, and quota availability.
Review destructive and cost risk: deleting an endpoint deletes underlying deployments, while creating endpoints can start billable serving capacity through deployments.
Use output filtering carefully so runbooks capture scoring_uri, provisioning_state, auth_mode, traffic, identity, and endpoint location without exposing secrets.

What output tells you

scoring_uri identifies the address clients call and confirms whether the endpoint name and region are the intended ones.
auth_mode and credential output show whether callers need keys, tokens, or Microsoft Entra based access to invoke the endpoint.
traffic and mirror_traffic fields show which deployments receive production or shadow requests and at what percentage.
provisioning_state, identity, and network fields help separate endpoint readiness problems from deployment, RBAC, or private connectivity failures.

Mapped Azure CLI commands

Online endpoint operator commands

operator-workflow

az ml online-endpoint list --workspace-name <workspace-name> --resource-group <resource-group> --output table

az ml online-endpointdiscoverAI and Machine Learning

az ml online-endpoint show --workspace-name <workspace-name> --resource-group <resource-group> --name <endpoint-name>

az ml online-endpointdiscoverAI and Machine Learning

az ml online-endpoint create --workspace-name <workspace-name> --resource-group <resource-group> --file endpoint.yml

az ml online-endpointprovisionAI and Machine Learning

az ml online-endpoint invoke --workspace-name <workspace-name> --resource-group <resource-group> --name <endpoint-name> --request-file sample-request.json

az ml online-endpointoperateAI and Machine Learning

az ml online-endpoint update --workspace-name <workspace-name> --resource-group <resource-group> --name <endpoint-name> --traffic <deployment-name>=100

az ml online-endpointconfigureAI and Machine Learning

az ml online-endpoint delete --workspace-name <workspace-name> --resource-group <resource-group> --name <endpoint-name>

az ml online-endpointremoveAI and Machine Learning

Architecture context

Security

Security impact is direct because online endpoints expose inference behavior to callers. Authentication can use keys, Azure Machine Learning tokens, or Microsoft Entra based access depending on configuration. Inbound isolation can require private endpoint access to the workspace, while outbound communication from deployments can be restricted through a managed virtual network. Risks include leaked keys, overly broad custom roles that can list credentials, public endpoints serving sensitive models, and logs that retain regulated input data. Secure endpoints use least privilege, protected credentials, private networking where required, input validation, responsible logging, and monitoring for unusual invocation patterns or repeated authentication failures.

Cost

Cost impact is mostly indirect at the endpoint level and direct through the deployments it hosts. The endpoint itself organizes scoring and traffic, while billable compute is usually tied to managed online deployments, VM SKUs, instance counts, networking, and logging. However, endpoint sprawl creates hidden spend when every experiment keeps active deployments, private networking, and monitoring resources. Cost analysis should group spend by endpoint, deployment, SKU, owner, model purpose, and traffic level. Good FinOps practice retires unused endpoints, removes orphan deployments, right-sizes active serving capacity, and keeps endpoint-level cost evidence visible to product owners instead of only ML platform teams.

Reliability

Reliability impact is direct because client applications depend on endpoint availability and routing. Managed online endpoints help by handling serving, scaling, securing, and monitoring infrastructure, but reliability still depends on healthy deployments, enough instances, quota, dependency access, and correct traffic splits. Reliable teams use more than one deployment for safe rollout, allocate traffic carefully, test scoring requests before release, and monitor latency, errors, and availability. They also document how to restore credentials, reroute traffic, or recreate endpoints after configuration damage. During incidents, operators must distinguish endpoint authentication failure from deployment container failure, model error, network isolation issue, or downstream dependency outage. Clients should know which failures require retry and which require operator action.

Performance

Performance impact is visible to every client because endpoint routing, authentication, network path, deployment choice, and traffic split all affect response time. The endpoint can route traffic to a faster or slower deployment, mirror traffic for testing, or expose a private path that adds network dependencies. Performance tuning starts with endpoint metrics, latency percentiles, failure rates, and request volume, then narrows into deployment compute, model code, concurrency, and downstream calls. Operators should test with realistic payloads and clients, not only studio examples. A stable endpoint helps clients avoid integration churn while teams optimize backends for latency, throughput, and reliability goals. Slow client networks can hide deployment improvements, so every test path should include evidence.

Operations

Operators manage online endpoints by creating endpoint definitions, checking provisioning state, invoking test requests, retrieving credentials, updating traffic splits, and collecting logs and metrics. Azure CLI and studio both show endpoint status, scoring URI, authentication mode, and deployment relationships. Daily operations include certificate and key handling where applicable, RBAC review, private endpoint validation, alert tuning, traffic change approvals, and cleanup of unused endpoints. Incident runbooks should include sample request files, expected response examples, key retrieval rules, network path checks, and commands for routing traffic back to a known healthy deployment. Documentation should identify endpoint owners, consuming applications, and escalation contacts. Operators should review endpoint records whenever a client application, model deployment, or network boundary changes.

Common mistakes

Treating the endpoint and deployment as the same object, then changing a backend without understanding client traffic impact.
Publishing endpoint keys into application settings or logs without Key Vault or secret-management controls.
Deleting an endpoint for cleanup without realizing it also removes underlying deployments and rollback options.
Changing traffic splits before the target deployment has passed health, latency, schema, and security checks.

Operator quick checks

Does the endpoint name, scoring URI, region, and workspace match the environment clients will call?
Is the authentication mode compatible with the consuming application and approved security model?
Are at least one healthy deployment and one rollback path available before traffic changes?
Do metrics, logs, alerts, and sample invoke tests prove the endpoint is ready for production requests?

Questions to ask

What application boundary and client population depend on this endpoint?
Who can retrieve credentials, invoke the endpoint, change traffic, or delete it?
What breaks if authentication, private networking, scoring URI, or deployment routing is misconfigured?
What dashboard, sample request, alert, and rollback traffic command will be used after a change?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph