AI and Machine Learning Azure OpenAI field-manual-complete field-manual-complete field-manual-complete

Tool call

A tool call is the moment an AI model says, “I need the application to run this specific function with these arguments.” The model does not magically access your database, ticketing system, or shipping API by itself. Your app describes available tools, receives the model’s requested call, executes the trusted code or service, and sends the result back. Good tool-call design makes AI answers more useful because the model can work with fresh facts, transactions, calculations, and private business systems instead of guessing. This keeps useful action separate from model imagination. This keeps useful action separate from model imagination.

Aliases
function call, AI tool call, model tool call, function-calling request, tool invocation
Difficulty
intermediate
CLI mappings
5
Last verified
2026-05-28

Microsoft Learn

A tool call is a model-generated request for the application to invoke a named function or tool with structured arguments. In Azure OpenAI and Foundry agent workflows, tool calls let the model use external APIs, databases, search results, or business logic before producing a final answer.

Microsoft Learn: How to use function calling with Azure OpenAI in Microsoft Foundry Models2026-05-28

Technical context

In Azure AI architecture, tool calls sit in the orchestration layer between the model deployment and application-owned services. The model receives tool definitions, chooses a tool, emits structured arguments, and waits for the app or agent runtime to execute the tool. The actual security boundary remains in your application, managed identity, API gateway, database permissions, and validation code. Tool calls interact with prompt design, JSON schemas, function calling, Foundry agents, Azure Functions, Logic Apps, API Management, Key Vault, telemetry, and content safety checks.

Why it matters

Tool calls matter because they separate reasoning from action. Without them, a model may answer from stale prompt context or hallucinate operational steps it cannot actually perform. With them, the application can fetch order status, create a ticket, calculate eligibility, search an index, or call a controlled workflow while keeping business rules in code. The risk is that a badly designed tool can expose data, perform unintended actions, or trust malformed arguments. For learners, tool calls explain why production AI is not only prompt writing. It is an integration pattern where schema design, validation, permissions, logging, and rollback determine whether AI is safe enough to act. It also makes failures reviewable instead of mysterious. It also prevents unsafe automation shortcuts.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In chat completion or agent traces, model responses show tool name, arguments, tool call identifier, finish reason, validation expectations, and follow-up execution results for operator review.

Signal 02

In application code, tool schemas appear as JSON function definitions, handlers, validation logic, approval checks, identity selection, API calls, and reviewable execution contracts for releases.

Signal 03

In monitoring, dependency telemetry shows the tool endpoint, duration, failure state, executing identity, user request, conversation ID, and incident triage evidence during production support reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Retrieve real-time order, account, inventory, or ticket data so the model answers from trusted systems instead of stale prompt context.
  • Run controlled business calculations, such as eligibility, pricing, or scheduling, where deterministic code must own the final decision.
  • Create tickets, drafts, workflow requests, or queue messages with validation and approval instead of letting the model invent action steps.
  • Connect RAG answers to specialized lookup tools when one search index cannot cover every source or access boundary.
  • Audit AI actions by logging tool name, arguments, identity, result status, and correlation IDs for every model-requested operation.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Equipment rental assistant books available machinery safely

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A construction equipment rental firm wanted a chat assistant to reserve lifts and generators, but early prompts sometimes promised units that branch inventory did not actually have.

Business/Technical Objectives
  • Return availability from the live rental system before giving a booking answer.
  • Prevent the model from placing reservations outside customer credit and safety rules.
  • Reduce call-center transfers for simple reservation checks.
  • Capture an audit trail for every AI-assisted booking attempt.
Solution Using Tool call

The engineering team created separate read tools for branch inventory, customer eligibility, and rental-rate calculation, plus a mutating reservation tool that required an idempotency key. The model could request tools, but application code validated customer identity, branch permissions, rental dates, and equipment category before executing anything. Azure Functions hosted the tool handlers, Key Vault stored API credentials, and managed identity limited access to the rental API. CLI checks verified the Function app settings, identity principal, model deployment, and diagnostics before launch. Failed tool validations were returned as structured results so the model could explain the limitation instead of inventing alternatives.

Results & Business Impact
  • Reservation accuracy improved from 82% in pilot transcripts to 98.6% after live inventory tool checks.
  • Simple availability calls transferred to humans fell 41% in six weeks.
  • Duplicate reservation attempts were eliminated during retries by using idempotency keys.
  • Audit reviews could trace each booking from prompt to tool call to system response within two minutes.
Key Takeaway for Glossary Readers

Tool calls let an AI assistant act on real systems while application code keeps authority over validation, permissions, and business rules.

Case study 02

City utility triages outage reports with controlled actions

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A municipal utility used an AI assistant for outage questions, but residents needed status updates and crew dispatch requests that depended on live grid and ticketing systems.

Business/Technical Objectives
  • Fetch current outage status without exposing internal grid records.
  • Create duplicate-safe trouble tickets for validated service addresses.
  • Keep emergency routing rules outside the model prompt.
  • Give dispatch supervisors searchable evidence for AI-created tickets.
Solution Using Tool call

Architects defined three tools: outage-status lookup, service-address validation, and trouble-ticket creation. The first two were read-only and available in most conversations. The ticket-creation tool required a confirmed address, active account match, and explicit user consent captured by the application. API Management enforced request limits, Azure Functions executed tool logic, and the ticketing API accepted an idempotency key. The model received concise tool results, not raw grid data. Operators used CLI to confirm managed identity assignments, diagnostic settings, and deployment names across staging and production before the storm-season launch. Operators also reviewed denied requests after storm drills. Operators reviewed failed calls daily during the pilot. Release notes captured the review criteria. Supervisors also reviewed exception summaries daily during storms. Operators also rehearsed manual override steps.

Results & Business Impact
  • Average resident status-check time dropped from 6.5 minutes by phone to 48 seconds in chat.
  • Duplicate outage tickets fell 63% compared with the old web form during storms.
  • Dispatch supervisors gained a complete correlation trail for 100% of AI-created tickets.
  • No internal feeder or crew-location data appeared in assistant responses during red-team testing.
Key Takeaway for Glossary Readers

A well-bounded tool call can improve service speed without handing sensitive operations directly to the model.

Case study 03

Freight scheduler uses tools to avoid impossible appointments

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A freight brokerage built an AI scheduler for pickup appointments, but free-text scheduling failed when docks, driver hours, and carrier rules conflicted.

Business/Technical Objectives
  • Check dock calendars and carrier constraints before offering appointment times.
  • Stop the assistant from confirming appointments that violate driver-hour rules.
  • Reduce manual coordinator review for routine rescheduling.
  • Measure tool latency separately from model latency during peak operations.
Solution Using Tool call

The team exposed narrow tools for dock-slot search, carrier-rule lookup, transit-time calculation, and appointment hold creation. The model could suggest a slot only after the app returned valid options from the first three tools. Holds expired automatically unless a coordinator or customer confirmed the appointment. Tool results were trimmed to the top five feasible windows to control tokens and latency. Application Insights captured tool timing, validation failures, and downstream API errors under a shared correlation ID. Azure CLI checks verified Function configuration, Key Vault references, and diagnostic routing before the assistant was enabled for large shippers.

Results & Business Impact
  • Manual review of routine reschedules dropped 52% while exception cases still routed to coordinators.
  • Invalid appointment confirmations fell from 7.4% of pilot attempts to 0.3%.
  • P95 end-to-end scheduling latency improved from 14 seconds to 5.1 seconds after tool result trimming.
  • Operations separated model-selection issues from dock-calendar API failures during two production incidents.
Key Takeaway for Glossary Readers

Tool calls become powerful when each tool solves one real integration problem and returns just enough structured evidence for the model to continue.

Why use Azure CLI for this?

Azure CLI is not the tool-call executor, but it is valuable around the Azure resources that make tool calls safe and observable. As an experienced Azure engineer, I use CLI to confirm model deployments, application identities, Key Vault access, API endpoints, diagnostics, and hosting configuration before blaming the model for a failed function. Tool-call incidents often cross Azure OpenAI, App Service or Functions, API Management, storage, and logs. CLI gives a repeatable way to inventory those dependencies, compare environments, and export evidence. Portal clicks miss drift; commands show exactly which identity, endpoint, and setting production used. That saves time when teams argue over model versus infrastructure ownership. Evidence ends ambiguity. That saves time during outages. That matters when tool failures look like model failures.

CLI use cases

  • Verify the model deployment used by the app before reproducing tool-call behavior.
  • Confirm the hosting app has a managed identity before it calls downstream Azure services.
  • Inspect Function or App Service settings that hold tool endpoint names, feature flags, or safe configuration values.
  • List Key Vault secret names without exposing secret values when reviewing tool dependencies.
  • Confirm diagnostics are enabled for the model resource, application host, and downstream services involved in tool execution.

Before you run CLI

  • Select the tenant and subscription that host both the AI resource and the application executing the tools.
  • Use least-privilege read access for inspection; avoid listing secret values unless a break-glass procedure explicitly authorizes it.
  • Know whether the tool is read-only or mutating before replaying a request in production.
  • Capture resource IDs, deployment names, identity principal IDs, and correlation IDs so evidence links across services.
  • Check whether diagnostic logs contain sensitive arguments and choose output redaction before sharing evidence.

What output tells you

  • Deployment output confirms which model and version received the tool definitions and generated the tool call.
  • Identity output shows the principal that application code uses to access Key Vault, APIs, storage, databases, or queues.
  • Application settings reveal whether production points to the intended tool endpoints and schema-version flags.
  • Diagnostic settings show whether logs and metrics are routed somewhere operators can query during incidents.
  • Key Vault metadata confirms expected secrets exist and are enabled without printing secret values into terminals or tickets.

Mapped Azure CLI commands

Tool call CLI commands

adjacent
az cognitiveservices account deployment show --name <account> --resource-group <resource-group> --deployment-name <deployment> --output json
az cognitiveservices account deploymentdiscoverAI and Machine Learning
az webapp identity show --name <app-name> --resource-group <resource-group> --output json
az webapp identitydiscoverAI and Machine Learning
az functionapp config appsettings list --name <function-app> --resource-group <resource-group> --output table
az functionapp config appsettingsdiscoverWeb
az keyvault secret list --vault-name <vault-name> --query "[].{name:name,enabled:attributes.enabled}" --output table
az keyvault secretdiscoverAI and Machine Learning
az monitor diagnostic-settings list --resource <resource-id> --output json
az monitor diagnostic-settingsdiscoverAI and Machine Learning

Architecture context

Architecturally, a tool call is an integration contract, not a chat decoration. I design each tool as a narrow operation with a clear name, strict schema, validated arguments, least-privilege identity, timeout, retry policy, and audit trail. Read-only tools are safer than write tools, and write tools often need human approval, idempotency keys, or compensating actions. The model should never be treated as the authority to bypass business rules. Strong designs keep the model responsible for choosing and formatting the request, while application code enforces authorization, performs the action, records telemetry, and summarizes the result back to the model. This separation is what makes agentic patterns supportable. Keep boundaries explicit. Tool ownership should be reviewed before production traffic. I also version tool contracts beside application releases.

Security

Security impact is direct because a tool call can bridge natural-language input to real systems. Treat the model’s arguments as untrusted input, even when they look valid JSON. Validate schemas, authorize the user separately from the model, constrain tool scope, protect secrets in Key Vault, and prefer managed identity over embedded keys. Prompt injection can attempt to steer the model toward unauthorized tools or malicious arguments, so tool descriptions should be precise and policies should block sensitive actions. Logs must capture tool name, caller, arguments after redaction, result status, and correlation IDs without leaking secrets or regulated data. The handler, not the model, should be the security boundary. Audit every handler. Keep approvals explicit. Approval flows should cover dangerous tools.

Cost

Cost impact is indirect but visible. A tool call usually consumes model tokens before and after the external action, and the handler may call Azure Functions, Logic Apps, databases, APIs, storage, or search indexes that have their own charges. Poor schemas can cause repeated calls, large tool results, or loops that inflate both model and downstream cost. Teams should cache stable lookups, return compact results, cap retries, and track per-tool usage. Budgets should include both model consumption and the Azure services that execute the action. Review ownership, idle usage, scale assumptions, chargeback signals, and retention behavior before expanding capacity. Review ownership, idle usage, scale assumptions, chargeback signals, and retention behavior before expanding capacity.

Reliability

Reliability depends on making tool calls deterministic, observable, and recoverable. The model may choose a tool, but the application must validate arguments, handle retries, enforce timeouts, and decide what to do when the downstream system is unavailable. A failed tool should not always fail the whole conversation; sometimes the assistant can ask for clarification or present a safe fallback. Reliable designs isolate handlers, log tool-call identifiers, preserve idempotency, and test degraded paths. Synthetic tests should cover normal, slow, denied, and unavailable tool paths. Test normal operation, degraded behavior, rollback steps, and dependency failure before production changes. Test normal operation, degraded behavior, rollback steps, and dependency failure before production changes.

Performance

Performance is affected because each tool call adds a round trip outside model generation. The user waits for model reasoning, handler execution, downstream service latency, result serialization, and the follow-up model response. Slow tools make an assistant feel slow even when the model is fast. Good designs keep schemas small, validate arguments early, set timeout budgets, cache stable data, and avoid unnecessary chained calls. Monitoring should break out model latency, tool latency, dependency latency, and retry time so teams tune the real bottleneck. Benchmark realistic load, dependency latency, concurrency limits, diagnostic query speed, retry behavior, and rollback impact before declaring the design ready.

Operations

Operators support tool calls by tracking schemas, handler versions, dependency health, identities, approvals, and model behavior changes. They inspect traces that link a user request to a tool name, arguments, execution result, latency, and downstream error. Release runbooks should include schema diffs, safe sample prompts, denied-action examples, rollback steps, and owner contacts. During incidents, operators need to know whether failure came from the model choice, argument validation, the handler, network access, authentication, or the external service. Store repeatable command output, owner contacts, rollback evidence, normal examples, and post-change verification steps with the operational runbook. Store repeatable command output, owner contacts, rollback evidence, normal examples, and post-change verification steps with the operational runbook.

Common mistakes

  • Trusting model-generated arguments without server-side validation and user authorization.
  • Giving one broad tool permission to perform many unrelated actions instead of creating narrow, auditable tools.
  • Changing a tool schema without testing whether the model still chooses and fills it correctly.
  • Returning huge tool results that exhaust context window and slow the final model response.
  • Logging raw tool arguments that contain personal data, secrets, or regulated business records.