AI and Machine Learning Microsoft Foundry premium

Foundry Local

Foundry Local is an on-device AI runtime and SDK that lets applications run curated generative AI models locally instead of sending every prompt to a cloud endpoint. Teams use it to embed private, low-latency AI features in apps that need offline behavior, data locality, or predictable response time. In Azure reviews, it matters when someone must approve access, troubleshoot behavior, estimate cost, or explain why the configuration exists. Treat it as a design choice tied to owners, users, evidence, and rollback.

Back to glossary browser Open Microsoft Learn source

Aliases: Microsoft Foundry Local, Foundry Local CLI, on-device AI inference, local AI runtime
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-14

Microsoft Learn

Foundry Local is an on-device AI runtime and SDK for shipping applications that run curated generative AI models locally on user hardware.

Microsoft Learn: What is Foundry Local?2026-05-14

Technical context

Technically, Foundry Local is understood through Foundry Local Core API, language SDKs, local model catalog, local model cache, ONNX Runtime, execution providers, optional CLI, optional REST endpoint, and application packaging. Important settings include model alias, model license, hardware target, cache location, execution provider, SDK version, first-run download behavior, diagnostic logging, and unload path. Operators inspect it with Foundry Local CLI output, SDK logs, application traces, local model cache state, endpoint status, package manifest, and device hardware inventory.

Why it matters

Foundry Local matters because it changes how teams design, approve, troubleshoot, and explain an Azure workload. If the concept is misunderstood, teams may grant the wrong access, hide an unhealthy dependency, overbuild capacity, miss audit evidence, or create a user-facing failure that looks like an application bug. It affects security, reliability, operations, cost, and performance because one setting can influence who reaches the workload, how traffic behaves, what gets logged, how much capacity is consumed, and how quickly support can recover. A strong definition helps architects and operators ask the practical questions before the change reaches production. Always tie the review to one subscription, environment, owner, and measurable business outcome.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

A desktop or edge application package includes Foundry Local SDK references, model aliases, cache settings, license notes, and device readiness checks before release during review.

Signal 02

Support logs show local model downloads, execution provider selection, prompt latency, cache hits, unload events, and hardware fallback decisions for a user device during review.

Signal 03

Security review notes say prompts and outputs remain on device, while model files, diagnostics, application permissions, and optional telemetry still need controls during review for production evidence.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Design and review Foundry Local for a production Azure workload before traffic, data, or model behavior depends on it.
Troubleshoot Foundry Local by comparing live configuration, logs, metrics, ownership, and downstream service health.
Document Foundry Local in architecture, security, cost, and support runbooks so teams share the same operating language.
Use Foundry Local during release planning to confirm prerequisites, access, rollback, monitoring, and customer-impact assumptions.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Foundry Local in action for mobile healthcare

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Northlake Clinics, a mobile healthcare organization, needed to solve a concrete production challenge: clinicians needed AI-assisted intake summaries in rural vans where connectivity was inconsistent and patient notes could not leave the device. Leaders wanted a practical Azure design that support, security, and business owners could understand.

Business/Technical Objectives

Keep intake text local
Cut summary time by 40 percent
Support offline appointments
Avoid cloud token spend

Solution Using Foundry Local

The team used Foundry Local as the control point for the change. The architects embedded Foundry Local in the intake tablet application, selected a small approved chat model, documented the model license, and packaged a first-run readiness check for device hardware. The app used local prompts, local outputs, and an operator-approved summary workflow. Support teams used Foundry Local CLI checks to confirm model availability, cache health, and execution provider selection before releasing the build. Application Insights collected only non-sensitive operational telemetry from the wrapper application, while patient text remained on the device. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health.

Results & Business Impact

Summary time dropped 44 percent
Offline visits continued normally
No prompts crossed the network
Support calls fell 18 percent

Key Takeaway for Glossary Readers

Foundry Local is valuable when privacy, offline use, and low latency matter more than centralized cloud inference.

Case study 02

Foundry Local in action for industrial manufacturing

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HarborWorks Manufacturing, a industrial manufacturing organization, needed to solve a concrete production challenge: quality inspectors wanted on-device defect explanations beside production lines where network latency slowed cloud model calls. Leaders wanted a practical Azure design that support, security, and business owners could understand.

Business/Technical Objectives

Reduce inspection delay
Keep images on plant devices
Standardize defect language
Limit infrastructure changes

Solution Using Foundry Local

The team used Foundry Local as the control point for the change. Engineers used Foundry Local inside an inspection workstation app that already captured defect images. The model produced short defect explanations and suggested next inspection steps, but final disposition stayed with the human inspector. The rollout included model cache validation, GPU fallback tests, license evidence, and a safe unload path. Operations compared device logs, prompt timing, and inspection throughput during pilot shifts before expanding to more workstations. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health. The change record linked configuration evidence to measurable outcomes so later audits and incident reviews could reconstruct the decision quickly.

Results & Business Impact

Average guidance latency fell 62 percent
Image data stayed onsite
Defect labels became more consistent
No new inference servers were required

Key Takeaway for Glossary Readers

Foundry Local can bring AI close to the user without turning every plant workflow into a cloud dependency.

Case study 03

Foundry Local in action for legal services

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

CedarLegal Group, a legal services organization, needed to solve a concrete production challenge: attorneys needed quick draft summaries for confidential notes while traveling, but policy prohibited sending draft strategy text to external services. Leaders wanted a practical Azure design that support, security, and business owners could understand.

Business/Technical Objectives

Protect confidential notes
Work during travel
Provide draft-only assistance
Document model approval

Solution Using Foundry Local

The team used Foundry Local as the control point for the change. The technology team integrated Foundry Local into a secure laptop drafting tool. Approved models were cached before travel, and the app displayed clear warnings that summaries were drafts requiring attorney review. Security teams reviewed model licenses, local cache permissions, device encryption, and optional diagnostics. Operations created a checklist for confirming the Foundry Local CLI version, model alias, and successful local run before trial teams used the tool. Before release, engineers captured read-only evidence, confirmed owners and access, checked diagnostics or local logs, and documented rollback steps. Operations monitored the first production window with metrics that matched the stated objectives, not just generic resource health. The change record linked configuration evidence to measurable outcomes so later audits and incident reviews could reconstruct the decision quickly.

Results & Business Impact

Travel drafting delays fell 35 percent
Confidential prompts stayed local
Model approval evidence passed audit
Attorneys kept final review authority

Key Takeaway for Glossary Readers

Foundry Local works best when local inference is paired with human accountability and clear device controls.

Why use Azure CLI for this?

CLI checks make Foundry Local review repeatable because they capture scoped evidence for configuration, ownership, dependencies, health, and change impact before operators modify production.

CLI use cases

List or show the Azure or local resources related to Foundry Local before selecting a target for deeper review.
Capture read-only evidence for Foundry Local during release approval, incident response, access review, or cost investigation.
Compare configuration, metrics, logs, and dependent resources for Foundry Local across environments before approving a mutating command.

Before you run CLI

Confirm tenant, subscription, resource group, profile, endpoint, project, device, or local model scope before trusting command output.
Run list and show commands first, then save evidence before create, update, purge, restart, delete, scale, or access changes.
Check whether the command affects customer traffic, local user devices, cached content, model behavior, cost, or compliance evidence.

What output tells you

Names, resource IDs, locations, SKUs, enabled states, and parent relationships show whether you are inspecting the intended target.
Settings, identities, routes, deployments, endpoints, origins, cache paths, or model metadata explain how requests or workloads behave today.
Timestamps, metrics, usage, health state, and logs help separate Azure configuration issues from application, device, or downstream failures.

Mapped Azure CLI commands

Foundry Local operational checks

direct

foundry --version

foundry model list

foundry model info <model-alias> --license

foundry model run <model-alias>

foundry service restart

Architecture context

Security

Security for Foundry Local starts with local prompts, outputs, model files, cache folders, application permissions, optional diagnostics, and user device access. Review who can create it, change it, delete it, read diagnostics, approve connected resources, and use any credentials or identities involved. Prefer managed identity and Microsoft Entra ID where supported, keep secrets out of code, and scope roles to the smallest useful boundary. Capture Activity Log entries, role assignments, network settings, policy exemptions, and owner approvals before production changes. The goal is to prove that access, exposure, and data handling were intentional rather than accidental side effects of a quick deployment.

Cost

Cost for Foundry Local is driven by reduced token charges, device testing, model distribution, endpoint elimination, help-desk effort, local storage, and support for hardware variation. The expensive mistake is not only Azure consumption; it can also be duplicate experiments, broad changes, support time, overprovisioned capacity, or emergency cleanup after weak design evidence. Review whether the workload truly needs the selected tier, retention, diagnostics, network path, cache behavior, or automation pattern. Use tags, budgets, alerts, and recurring cleanup reviews so teams can explain why the current design exists and remove stale resources without breaking dependencies. Always tie the review to one subscription, environment, owner, and measurable business outcome.

Reliability

Reliability for Foundry Local depends on first-run downloads, cache reuse, hardware detection, driver changes, offline mode, SDK version drift, and application fallback behavior. A resource can appear healthy while the business workflow fails because a route, dependency, identity, cache, quota, or downstream service is wrong. Test common failure modes, disabled states, retries, rollback paths, and maintenance behavior before relying on the design. Keep runbooks for first-response checks, owner escalation, and safe rollback. During incidents, compare platform metrics, deployment history, configuration changes, and application traces from the same time window before changing production settings. Always tie the review to one subscription, environment, owner, and measurable business outcome.

Performance

Performance for Foundry Local depends on model size, CPU or GPU or NPU selection, prompt length, cache warmth, local memory, startup time, and concurrent app requests. Measure platform-side metrics and application-side completion metrics because a fast control-plane response does not always mean users received the right result. Test with realistic data sizes, regions, concurrency, authentication paths, route choices, cache state, and downstream limits. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one service. The best tuning decisions come from evidence tied to the exact environment. Always tie the review to one subscription, environment, owner, and measurable business outcome.

Operations

Operations for Foundry Local require installer versioning, support scripts, device readiness checks, prompt tests, model license capture, and rollback to cloud or non-AI behavior. Before a change, capture read-only CLI output, portal evidence when useful, owner tags, expected behavior, and a rollback path. During incidents, avoid changing several settings at once; compare metrics, logs, deployment operations, identity evidence, network state, and downstream health first. Keep release notes clear enough for support teams to verify current behavior quickly. Good operational practice turns the term into something observable, reviewable, and recoverable instead of tribal knowledge. Always tie the review to one subscription, environment, owner, and measurable business outcome.

Common mistakes

Treating Foundry Local as a simple label instead of checking the live scope, owner, dependencies, and current configuration.
Running a mutating command in the wrong subscription, profile, resource group, project, endpoint, origin group, or local device context.
Assuming a successful command means users saw the correct result without checking logs, metrics, application behavior, and rollback evidence.

Operator quick checks

Verify scope, owner tags, enabled state, identity, network path, diagnostics, and linked resources before changing production behavior.
Check service metrics, logs, deployment operations, usage, health, cache state, or local runtime status in the same time window.
Confirm downstream dependencies, permissions, route behavior, rollback steps, and customer-impact assumptions match the approved design.

Questions to ask

Who owns Foundry Local, and where are the approved resource IDs, settings, access model, and rollback details documented?
Which upstream and downstream services depend on this configuration, and what metric proves each dependency is healthy right now?
What customer, compliance, cost, or incident impact appears if this setting is wrong, stale, disabled, purged, overloaded, or exposed?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph