Foundry Models is the catalog and deployment experience in Microsoft Foundry where teams discover, compare, deploy, and use AI models from Microsoft and partner providers. Teams use it to select models for chat, reasoning, embeddings, multimodal workloads, agents, and application inference without rebuilding the application around each provider. In Azure reviews, it matters when someone must approve access, troubleshoot behavior, estimate cost, or explain why the configuration exists. Treat it as a design choice tied to owners, users, evidence, and rollback.
Microsoft Foundry Models, Foundry model catalog, AI Foundry models, model catalog
Difficulty
intermediate
CLI mappings
5
Last verified
2026-05-14
Microsoft Learn
Foundry Models is the Microsoft Foundry model catalog and deployment experience for discovering, evaluating, and using cloud AI models through managed endpoints.
Technically, Foundry Models is understood through model catalog entries, deployment types, Foundry resources, model endpoints, quota, model cards, provider terms, Azure OpenAI compatibility, keyless authentication, and monitoring. Important settings include model family, deployment name, region, endpoint, authentication mode, content filters, capacity, rate limits, provider agreement, and evaluation evidence. Operators inspect it with Foundry portal model cards, deployment lists, endpoint configuration, usage metrics, quota output, model benchmark notes, and application inference traces. The term becomes operational when resource IDs, permissions, diagnostic settings, network paths, quotas, deployment history, and change approvals can be tied to one environment and one accountable service owner.
Why it matters
Foundry Models matters because it changes how teams design, approve, troubleshoot, and explain an Azure workload. If the concept is misunderstood, teams may grant the wrong access, hide an unhealthy dependency, overbuild capacity, miss audit evidence, or create a user-facing failure that looks like an application bug. It affects security, reliability, operations, cost, and performance because one setting can influence who reaches the workload, how traffic behaves, what gets logged, how much capacity is consumed, and how quickly support can recover. A strong definition helps architects and operators ask the practical questions before the change reaches production. Always tie the review to one subscription, environment, owner, and measurable business outcome.
⌁
Where you see it
Signals, screens, and Azure surfaces where this term usually becomes operational.
Signal 01
A Foundry project shows model catalog selections, deployment names, endpoint details, provider terms, quota, evaluations, and team access for application builders during review for production evidence.
Signal 02
Application telemetry includes model names, deployment IDs, token usage, latency, throttling, content filter events, and retry behavior for inference calls during review for production evidence.
Signal 03
Architecture reviews compare model capability, region, support terms, authentication, cost, responsible AI controls, and fallback options before production approval during review for production evidence before approval.
✦
When this becomes relevant
Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.
Design and review Foundry Models for a production Azure workload before traffic, data, or model behavior depends on it.
Troubleshoot Foundry Models by comparing live configuration, logs, metrics, ownership, and downstream service health.
Document Foundry Models in architecture, security, cost, and support runbooks so teams share the same operating language.
Use Foundry Models during release planning to confirm prerequisites, access, rollback, monitoring, and customer-impact assumptions.
◆
Real-world case studies
Different enterprise-style examples that show the term being used to hit measurable objectives.
Case study 01
Foundry Models in action for online retail
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
BrightCart Retail, a online retail organization, needed to solve a concrete production challenge: product teams were testing several generative models for shopping assistance but lacked a governed way to compare deployment cost, latency, and response quality. Leaders wanted a practical Azure design that support, security, and business owners could understand.
🎯Business/Technical Objectives
Compare approved models
Reduce test setup time
Track token usage
Choose one production deployment
✅Solution Using Foundry Models
The team used Foundry Models as the control point for the change. Architects used Foundry Models to browse candidate model families, deploy two approved options in a Foundry resource, and evaluate responses against real product questions. The application called deployment names through a stable endpoint pattern, so engineers could switch models without rewriting core code. Azure Monitor and deployment usage output captured latency, throttling, and token trends. Security reviewed provider terms, access assignments, and content filtering before production approval. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health. The change record linked configuration evidence to measurable outcomes so later audits and incident reviews could reconstruct the decision quickly.
📈Results & Business Impact
Model evaluation cycle dropped 50 percent
One deployment met latency targets
Token cost estimates were documented
Security approval moved into the release gate
💡Key Takeaway for Glossary Readers
Foundry Models helps teams compare capability, cost, and governance before a model becomes a production dependency.
Case study 02
Foundry Models in action for financial services
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
FidelityBridge Bank, a financial services organization, needed to solve a concrete production challenge: fraud analysts needed a controlled model deployment for case summarization without giving every team direct access to model keys. Leaders wanted a practical Azure design that support, security, and business owners could understand.
🎯Business/Technical Objectives
Use Entra-based access
Summarize fraud cases faster
Preserve audit evidence
Avoid unmanaged provider accounts
✅Solution Using Foundry Models
The team used Foundry Models as the control point for the change. The bank deployed selected Foundry Models in a controlled Foundry resource and used keyless authentication from an internal case application. Model deployments were named by environment, and access was limited through managed identities and role assignments. The team monitored token usage, content filter events, and response latency. Analysts received summaries with citations to internal case fields, while final fraud decisions remained in the existing workflow system. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health. The change record linked configuration evidence to measurable outcomes so later audits and incident reviews could reconstruct the decision quickly.
📈Results & Business Impact
Case summarization time fell 29 percent
No shared model keys were distributed
Audit logs tied calls to identities
Analysts retained final decision authority
💡Key Takeaway for Glossary Readers
Foundry Models is strongest when model access is governed through identity, deployments, and measurable usage evidence.
Case study 03
Foundry Models in action for public sector
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
MetroPermit Office, a public sector organization, needed to solve a concrete production challenge: staff wanted a language model for permit questions but needed proof that the chosen deployment could handle civic terminology and seasonal demand. Leaders wanted a practical Azure design that support, security, and business owners could understand.
🎯Business/Technical Objectives
Evaluate domain accuracy
Prepare for seasonal spikes
Keep deployment ownership clear
Document fallback behavior
✅Solution Using Foundry Models
The team used Foundry Models as the control point for the change. The architecture team used Foundry Models to deploy a candidate model and run evaluation prompts built from permit FAQs, zoning terms, and escalation examples. A second deployment was kept as fallback for comparison. Operators captured deployment lists, usage, and throttling data before the public launch. The application logged model name, deployment name, response time, and escalation flag, which gave support teams a clear incident trail. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health. The change record linked configuration evidence to measurable outcomes so later audits and incident reviews could reconstruct the decision quickly.
📈Results & Business Impact
FAQ handling accuracy reached 87 percent
Peak-season quota was requested early
Fallback deployment was tested
Support tickets dropped 22 percent
💡Key Takeaway for Glossary Readers
Foundry Models turns model choice into an auditable operational decision instead of a one-time developer preference.
Why use Azure CLI for this?
CLI checks make Foundry Models review repeatable because they capture scoped evidence for configuration, ownership, dependencies, health, and change impact before operators modify production.
CLI use cases
List or show the Azure or local resources related to Foundry Models before selecting a target for deeper review.
Capture read-only evidence for Foundry Models during release approval, incident response, access review, or cost investigation.
Compare configuration, metrics, logs, and dependent resources for Foundry Models across environments before approving a mutating command.
Before you run CLI
Confirm tenant, subscription, resource group, profile, endpoint, project, device, or local model scope before trusting command output.
Run list and show commands first, then save evidence before create, update, purge, restart, delete, scale, or access changes.
Check whether the command affects customer traffic, local user devices, cached content, model behavior, cost, or compliance evidence.
What output tells you
Names, resource IDs, locations, SKUs, enabled states, and parent relationships show whether you are inspecting the intended target.
Settings, identities, routes, deployments, endpoints, origins, cache paths, or model metadata explain how requests or workloads behave today.
Timestamps, metrics, usage, health state, and logs help separate Azure configuration issues from application, device, or downstream failures.
Mapped Azure CLI commands
Foundry Models operational checks
direct
az cognitiveservices account list --resource-group <resource-group> --output table
az cognitiveservices accountdiscoverAI and Machine Learning
az cognitiveservices account deployment list --name <foundry-resource> --resource-group <resource-group>
az cognitiveservices account deploymentdiscoverAI and Machine Learning
az cognitiveservices account deployment show --name <foundry-resource> --resource-group <resource-group> --deployment-name <deployment-name>
az cognitiveservices account deploymentdiscoverAI and Machine Learning
az cognitiveservices account list-models --name <foundry-resource> --resource-group <resource-group>
az cognitiveservices accountdiscoverAI and Machine Learning
az cognitiveservices account list-usage --name <foundry-resource> --resource-group <resource-group>
az cognitiveservices accountdiscoverAI and Machine Learning
Architecture context
Technically, Foundry Models is understood through model catalog entries, deployment types, Foundry resources, model endpoints, quota, model cards, provider terms, Azure OpenAI compatibility, keyless authentication, and monitoring. Important settings include model family, deployment name, region, endpoint, authentication mode, content filters, capacity, rate limits, provider agreement, and evaluation evidence. Operators inspect it with Foundry portal model cards, deployment lists, endpoint configuration, usage metrics, quota output, model benchmark notes, and application inference traces. The term becomes operational when resource IDs, permissions, diagnostic settings, network paths, quotas, deployment history, and change approvals can be tied to one environment and one accountable service owner.
Security
Security for Foundry Models starts with provider terms, data handling, endpoint identity, keyless authentication, model access, prompt logging, content filtering, and approval of model families. Review who can create it, change it, delete it, read diagnostics, approve connected resources, and use any credentials or identities involved. Prefer managed identity and Microsoft Entra ID where supported, keep secrets out of code, and scope roles to the smallest useful boundary. Capture Activity Log entries, role assignments, network settings, policy exemptions, and owner approvals before production changes. The goal is to prove that access, exposure, and data handling were intentional rather than accidental side effects of a quick deployment.
Cost
Cost for Foundry Models is driven by token usage, deployment type, capacity, regional selection, duplicate experiments, evaluation runs, embedding volume, and unused model deployments. The expensive mistake is not only Azure consumption; it can also be duplicate experiments, broad changes, support time, overprovisioned capacity, or emergency cleanup after weak design evidence. Review whether the workload truly needs the selected tier, retention, diagnostics, network path, cache behavior, or automation pattern. Use tags, budgets, alerts, and recurring cleanup reviews so teams can explain why the current design exists and remove stale resources without breaking dependencies. Always tie the review to one subscription, environment, owner, and measurable business outcome.
Reliability
Reliability for Foundry Models depends on regional availability, deployment readiness, quota exhaustion, endpoint health, model retirement, rate limits, and fallback model choices. A resource can appear healthy while the business workflow fails because a route, dependency, identity, cache, quota, or downstream service is wrong. Test common failure modes, disabled states, retries, rollback paths, and maintenance behavior before relying on the design. Keep runbooks for first-response checks, owner escalation, and safe rollback. During incidents, compare platform metrics, deployment history, configuration changes, and application traces from the same time window before changing production settings. Always tie the review to one subscription, environment, owner, and measurable business outcome.
Performance
Performance for Foundry Models depends on model latency, token volume, context size, deployment region, endpoint load, rate limits, streaming behavior, and client retry patterns. Measure platform-side metrics and application-side completion metrics because a fast control-plane response does not always mean users received the right result. Test with realistic data sizes, regions, concurrency, authentication paths, route choices, cache state, and downstream limits. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one service. The best tuning decisions come from evidence tied to the exact environment. Always tie the review to one subscription, environment, owner, and measurable business outcome.
Operations
Operations for Foundry Models require model inventory, deployment naming, quota requests, owner tags, evaluation records, release notes, incident runbooks, and safe provider change control. Before a change, capture read-only CLI output, portal evidence when useful, owner tags, expected behavior, and a rollback path. During incidents, avoid changing several settings at once; compare metrics, logs, deployment operations, identity evidence, network state, and downstream health first. Keep release notes clear enough for support teams to verify current behavior quickly. Good operational practice turns the term into something observable, reviewable, and recoverable instead of tribal knowledge. Always tie the review to one subscription, environment, owner, and measurable business outcome.
Common mistakes
Treating Foundry Models as a simple label instead of checking the live scope, owner, dependencies, and current configuration.
Running a mutating command in the wrong subscription, profile, resource group, project, endpoint, origin group, or local device context.
Assuming a successful command means users saw the correct result without checking logs, metrics, application behavior, and rollback evidence.