AI and Machine Learning Microsoft Foundry premium

Foundry Agent

A Foundry Agent is an AI agent built or hosted in Microsoft Foundry Agent Service with instructions, model selection, tools, knowledge, and runtime execution evidence. Teams use it to automate business tasks that require model reasoning, tool calls, enterprise knowledge, workflow steps, retrieval, approvals, and observable runs inside a governed Azure AI platform. It is not a magic autonomous worker, a security principal by itself, a replacement for process controls, or a safe production feature without tool permissions, evaluation, monitoring, and human escalation.

Aliases
Microsoft Foundry agent, Foundry Agent Service agent, AI agent in Foundry, agent service agent
Difficulty
intermediate
CLI mappings
6
Last verified
2026-05-14

Microsoft Learn

A Foundry Agent is an AI agent built or hosted in Microsoft Foundry Agent Service with instructions, model selection, tools, knowledge, and runtime execution evidence.

Microsoft Learn: What is Microsoft Foundry Agent Service?2026-05-14

Technical context

Technically, the Foundry Agent is configured or observed through Foundry Agent Service, Foundry projects, agent instructions, model deployments, built-in tools, custom tools, Foundry IQ knowledge bases, threads, runs, traces, evaluations, connection resources, identities, and application SDKs. It depends on model availability, project configuration, tool permissions, identity access, knowledge source quality, prompt instructions, safety evaluation, quota, network path, tracing, application integration, and human-in-the-loop escalation design. Operators inspect it through the Azure portal, ARM or Bicep, Azure CLI, SDK or REST calls, Azure Monitor, diagnostic logs, and application telemetry.

Why it matters

Foundry Agent matters because it is the production object that turns generative AI capability into an application workflow with tools, data, ownership, and operational evidence. Without clear vocabulary, teams may over-permission tools, skip evaluations, expose sensitive knowledge, ignore tool failures, lose traceability, or deploy agents that cannot be stopped or explained during incidents. It also affects security, reliability, operations, cost, and performance because one configuration choice can change who can act, what fails, how quickly work completes, what evidence exists, and how much the platform costs. Good glossary discipline helps teams ask who owns it, what depends on it, which metric proves health, and what rollback path exists before a release.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

A Foundry project contains an agent with instructions, selected model, enabled tools, knowledge connections, evaluation results, trace settings, and application integration details. Review scope, owners, metrics, and rollback evidence.

Signal 02

Telemetry shows agent runs, tool calls, retrieval steps, token usage, failures, latency, and safety or evaluation outcomes for the same workflow. Review scope, owners, metrics, and rollback evidence.

Signal 03

Security review records mention tool permissions, managed identity scope, knowledge source access, prompt injection risks, human approval, and disablement procedures. Review scope, owners, metrics, and rollback evidence.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Build a grounded support, operations, or business-process agent with approved tools and measurable run evidence.
  • Review instructions, tools, identities, retrieval sources, and evaluation results before allowing an agent into production.
  • Troubleshoot slow, unsafe, or incorrect agent behavior by correlating traces, run history, model usage, and tool failures.
  • Support incident response by correlating Azure configuration, diagnostic logs, metrics, deployment history, and application traces.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Foundry Agent in action for industrial maintenance

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Cobalt Field Services, a industrial maintenance organization, needed to solve a production challenge: dispatchers spent hours triaging equipment tickets and searching manuals before assigning technicians. The architecture team used Foundry Agent to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Reduce ticket triage time
  • Ground answers in approved manuals
  • Require human approval for dispatch changes
  • Track every tool call
Solution Using Foundry Agent

Architects created a Foundry Agent in a governed project, connected it to Foundry IQ knowledge, enabled a work-order lookup tool, and required approval before assignment updates. Traces captured retrieval, tool calls, and final recommendations. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Average triage time fell by 46 percent
  • Manual citations appeared in agent responses
  • No dispatch change happened without approval
  • Tool-call traces supported incident review
Key Takeaway for Glossary Readers

A Foundry Agent is useful when autonomy is bounded by tools, grounding, and accountable run evidence.

Case study 02

Foundry Agent in action for insurance

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Vantage Claims, a insurance organization, needed to solve a production challenge: claim handlers needed a faster way to summarize documents and prepare next-step recommendations without bypassing compliance checks. The architecture team used Foundry Agent to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Summarize claim packets
  • Protect regulated data
  • Evaluate response quality
  • Escalate uncertain decisions
Solution Using Foundry Agent

The team configured a Foundry Agent with limited document retrieval, no direct payment authority, and evaluation tests for hallucination and sensitive-data handling. Application Insights tracked latency and handler feedback. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Packet review time dropped by 32 percent
  • Payment decisions stayed with humans
  • Evaluation failures blocked release candidates
  • Feedback improved prompt instructions
Key Takeaway for Glossary Readers

Foundry agents should accelerate decisions without silently owning decisions they are not allowed to make.

Case study 03

Foundry Agent in action for public infrastructure

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

MetroWater Utility, a public infrastructure organization, needed to solve a production challenge: operators needed outage guidance during storms when runbooks, sensor data, and crew schedules lived in separate systems. The architecture team used Foundry Agent to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Combine approved runbooks and status data
  • Keep operators in control
  • Provide auditable recommendations
  • Reduce incident coordination delay
Solution Using Foundry Agent

Engineers built a Foundry Agent with retrieval over runbooks, a read-only outage-status tool, and a crew-availability connector. The agent generated recommended actions while the incident commander approved execution. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Coordination delay fell by 28 percent
  • Recommendations included source references
  • No write tools were granted in phase one
  • Post-incident review used traces and run IDs
Key Takeaway for Glossary Readers

The best Foundry Agent designs separate advice, evidence, and authority clearly.

Why use Azure CLI for this?

Azure CLI helps validate Foundry Agent because it captures reproducible evidence for scope, configuration, permissions, runtime state, diagnostics, and related resources before a production change.

CLI use cases

  • List or show Azure resources and related configuration for Foundry Agent.
  • Capture read-only evidence before changing identity, networking, triggers, capacity, policy, deployment, or automation settings.
  • Compare Azure metrics, logs, run history, deployment operations, and application evidence during production incidents.

Before you run CLI

  • Confirm the tenant, subscription, resource group, resource names, environment, and time window are the intended scope.
  • Run read-only list, show, metrics, operation, or query commands before any create, update, delete, start, stop, policy, or deployment change.
  • Get approval for mutating commands because configuration changes can expose data, break workflows, increase cost, or alter compliance evidence.

What output tells you

  • Resource IDs, enabled state, configuration values, identity settings, network posture, and ownership metadata show the current design.
  • Metrics, logs, run history, or deployment operations show whether the platform behaved as expected during the reviewed time window.
  • Application and downstream evidence shows whether the issue is Azure configuration, permissions, client behavior, data readiness, or business processing.

Mapped Azure CLI commands

Some evidence is visible only in service logs, SDK behavior, deployment output, SQL metadata, portal configuration, or application telemetry; Azure CLI still validates surrounding resources and operational scope.

Architecture context

A Foundry Agent is an application-facing runtime component in Microsoft Foundry Agent Service that combines instructions, model deployment, tools, knowledge, threads, and execution traces. Architecturally, I place the agent behind a product API or workflow boundary rather than letting every team create ungoverned assistants. The design includes project ownership, model selection, tool permissions, managed identity, connected knowledge, prompt and instruction management, evaluation, content safety, and observability. Agents can call tools and retrieve data, so the security model must follow the action, not just the chat session. I review how runs are logged, how citations are produced, how failures surface, and what rollback looks like when an agent version behaves badly.

Security

Security for the Foundry Agent starts with knowing who can create agents, change instructions, attach tools, approve knowledge sources, assign identities, call the agent, view traces, and access outputs that may contain sensitive enterprise data. Review agent ID, project, model, instructions, tools, knowledge sources, identities, quotas, evaluation results, trace settings, run history, safety filters, owner, and rollback or disable mechanism before approving production changes. Prefer managed identity and Microsoft Entra ID where the service supports it, keep secrets in approved vaults, scope roles narrowly, and protect diagnostics that may reveal sensitive names, payloads, or operational patterns. During audits, capture Activity Log entries, role assignments, network settings, diagnostic settings, and owner approvals so teams can prove access and behavior were intentional.

Cost

Cost for the Foundry Agent is driven by model tokens, tool executions, retrieval calls, traces, evaluations, hosted agent runtime, knowledge indexing, duplicate experiments, failed retries, and operator review time after ambiguous or unsafe responses. The expensive mistake is not only Azure consumption; it is also duplicate processing, failed retries, audit cleanup, manual investigations, and unnecessary capacity caused by weak design evidence. Review whether the workload truly needs the selected tier, frequency, retention, diagnostics, network path, and automation pattern. Use tags, budgets, alerts, and recurring reviews so teams can explain why the current design exists and remove stale resources safely. This keeps Foundry Agent review specific across architecture, security, operations, and incident response.

Reliability

Reliability for the Foundry Agent depends on model endpoint health, tool availability, knowledge retrieval quality, quota, retry behavior, run state handling, timeout settings, trace collection, fallback path, and human review when automation cannot complete safely. A healthy Azure resource can still fail the business workflow if downstream services, identities, triggers, clients, or data contracts are wrong. Test retries, failover assumptions, disabled states, stale configuration, private DNS problems, timeout behavior, and duplicate processing before relying on the design. Keep runbooks for first-response checks, known limits, owner escalation, and rollback so support teams can recover without guessing. This keeps Foundry Agent review specific across architecture, security, operations, and incident response.

Performance

Performance for the Foundry Agent depends on model latency, token volume, tool-call sequence, retrieval complexity, connection latency, agent run polling, concurrent sessions, grounding quality, and whether the application waits for long-running tools synchronously. Measure platform-side metrics and application-side completion metrics because fast service response does not always mean the business task finished. Use realistic data sizes, concurrency, filter patterns, region placement, authentication paths, and downstream limits in tests. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one Azure service. This keeps Foundry Agent review specific across architecture, security, operations, and incident response.

Operations

Operations for the Foundry Agent require named owners, documented resource IDs, expected behavior, diagnostic settings, and first-response checks. Before a change, capture read-only CLI output, portal screenshots when useful, deployment history, and relevant application configuration. During incidents, avoid changing several settings at once. Compare service metrics, logs, run history, identity evidence, network state, and downstream health in the same time window. Keep release notes clear enough for support teams to verify current behavior quickly. This keeps Foundry Agent review specific across architecture, security, operations, and incident response. This keeps Foundry Agent review specific across architecture, security, operations, and incident response.

Common mistakes

  • Treating Foundry Agent as a label instead of checking the exact resource scope, live configuration, owner, and dependencies.
  • Changing several settings at once without saving read-only evidence, rollback instructions, and the expected metric change.
  • Assuming the Azure resource succeeded means the end-to-end business workflow completed correctly and safely.