Monitoring and Observability Azure Monitor premium

AI service diagnostic settings

AI service diagnostic settings is the switchboard that decides which logs and metrics an Azure AI resource exports and where those records are stored for troubleshooting, audit, and monitoring. Teams use it to capture request activity, route logs to Log Analytics, archive evidence, feed Event Hubs, and investigate failures without relying only on portal charts. You usually see it in the Monitoring > Diagnostic settings blade, az monitor diagnostic-settings commands, Log Analytics tables, Storage archives, and Event Hub streams. The practical habit is to identify the owner, affected boundary, and proof of current state before design, operations, or troubleshooting decisions.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure AI diagnostic settings, Foundry Tools diagnostic logging, Cognitive Services diagnostic settings, AI resource logs
Difficulty: intermediate
CLI mappings: 3
Last verified: 2026-05-09

Microsoft Learn

AI service diagnostic settings are Azure Monitor settings on an Azure AI or Foundry Tools resource that route resource logs and platform metrics to destinations such as Log Analytics, Storage, or Event Hubs for investigation and retention.

Microsoft Learn: Enable diagnostic logging for Foundry Tools2026-05-09

Technical context

Technically, AI service diagnostic settings sits in the observability and evidence-retention layer for Microsoft.CognitiveServices/accounts resources. It works with Azure Monitor diagnostic settings, resource logs, platform metrics, Log Analytics workspaces, Storage accounts, Event Hubs, and Azure Policy. The useful scope is each AI service resource, because that is where configuration, permissions, telemetry, and ownership meet. Operators should identify the control-plane setting, data-plane behavior, and monitoring evidence before changing it. Those signals turn an abstract concept into something an engineer can inspect during troubleshooting, reviews, and release validation.

Why it matters

AI service diagnostic settings matters because it changes decisions that affect real users, not just diagrams. When teams understand it, they can capture request activity, route logs to Log Analytics, archive evidence, feed Event Hubs, and investigate failures without relying only on portal charts with less guesswork and better evidence. When they ignore it, the usual result is unclear ownership, slow incident response, and configuration that behaves differently across environments. Strong Azure teams include this term in design reviews, release checklists, and operational runbooks. They also tie it to measurable signals such as enabled categories, destination IDs, retention plan, workspace table names, ingestion latency, and policy compliance, so a change can be approved, rejected, or rolled back based on facts.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

the Monitoring > Diagnostic settings blade, az monitor diagnostic-settings commands, Log Analytics tables, Storage archives, and Event Hub streams

Signal 02

Azure portal, CLI output, IaC templates, monitoring dashboards, and incident runbooks

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

capture request activity, route logs to Log Analytics, archive evidence, feed Event Hubs, and investigate failures without relying only on portal charts
standardize production configuration
collect evidence during audits and incidents

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

AI service diagnostic settings in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Summit Dental Group, a multi-site dental provider, had a platform team that needed evidence for failed speech transcription calls across private clinics. The team used AI service diagnostic settings as the operating focus so the change could be measured, governed, and production-safe.

Business/Technical Objectives

send logs to Log Analytics within 15 minutes
retain audit evidence for 180 days
reduce support escalations for API errors
separate production and test log destinations

Solution Using AI service diagnostic settings

Architects designed AI service diagnostic settings into the workflow as the formal operating boundary for AI resource logging. They integrated it with monitoring, tagging, and change control, then validated the design with a small pilot before expanding it to production. The team documented the CLI checks, approval owner, expected telemetry, and cleanup steps so future releases could repeat the pattern without rediscovery.

Results & Business Impact

The pilot reached production in 6 business days with no rollback or customer-visible interruption
Runbook-based checks reduced handoff questions by 31 percent during the next maintenance window
The team cut investigation time by 68 percent because telemetry pointed to the affected boundary quickly
Leadership received measurable proof that the design met its objective without expanding manual operations

Key Takeaway for Glossary Readers

AI service diagnostic settings is valuable because it turns an Azure concept into an operational decision that teams can secure, measure, automate, and improve.

Case study 02

AI service diagnostic settings in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

ForgeLine Robotics, a manufacturing automation company, had a platform team that lost visibility into computer vision API failures during night-shift inspections. The team used AI service diagnostic settings as the operating focus so the change could be measured, governed, and production-safe.

Business/Technical Objectives

capture error patterns by resource
archive logs for vendor review
alert on rising failed requests
connect diagnostic data to incident tickets

Solution Using AI service diagnostic settings

Architects designed AI service diagnostic settings into the workflow as the formal operating boundary for vision service diagnostics. They integrated it with monitoring, tagging, and change control, then validated the design with a small pilot before expanding it to production. The team documented the CLI checks, approval owner, expected telemetry, and cleanup steps so future releases could repeat the pattern without rediscovery.

Results & Business Impact

The pilot reached production in 6 business days with no rollback or customer-visible interruption
Runbook-based checks reduced handoff questions by 42 percent during the next maintenance window
The team cut investigation time by 41 percent because telemetry pointed to the affected boundary quickly
Leadership received measurable proof that the design met its objective without expanding manual operations

Key Takeaway for Glossary Readers

AI service diagnostic settings is valuable because it turns an Azure concept into an operational decision that teams can secure, measure, automate, and improve. The release team also kept the evidence reusable for the next review.

Case study 03

AI service diagnostic settings in action

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Pine Harbor County, a public-sector records office, had a platform team that had to prove document processing activity after a network compromise review. The team used AI service diagnostic settings as the operating focus so the change could be measured, governed, and production-safe.

Business/Technical Objectives

enable resource logs for all AI resources
route events to a central workspace
meet incident reconstruction requirements
standardize diagnostic settings with policy

Solution Using AI service diagnostic settings

Engineers moved document AI audit trail out of ad hoc portal changes and into a repeatable operating pattern centered on AI service diagnostic settings. They defined the production scope, tested the setting in lower environments, and connected the result to Azure Monitor, access review, and deployment evidence. The release checklist required an owner, expected state, validation command, and exception path before any production change was approved.

Results & Business Impact

Release preparation was shortened by 26 percent because the team reused the same evidence checklist
Configuration drift findings fell by 46 percent after owners compared expected state with runtime output
Support escalation time dropped to about 28 minutes because first responders knew which signal to inspect
The production change passed security review without emergency exceptions or undocumented owner overrides

Key Takeaway for Glossary Readers

AI service diagnostic settings is valuable because it turns an Azure concept into an operational decision that teams can secure, measure, automate, and improve.

Why use Azure CLI for this?

CLI is a strong fit because diagnostic settings are repetitive, evidence-driven, and easy to misconfigure manually across many AI resources.

CLI use cases

Inspect the Azure resources related to AI service diagnostic settings before a change.
Export repeatable evidence for enabled categories, destination IDs, retention plan, workspace table names, ingestion latency, and policy compliance.
Compare production and nonproduction configuration without relying on portal screenshots.
Automate routine checks in deployment pipelines or incident runbooks.

Before you run CLI

Confirm the correct tenant, subscription, resource group, and environment before running commands.
Use least-privileged access and avoid exposing keys, tokens, prompt data, or kubeconfig credentials in shell history.
Decide whether the command is read-only, configuration-changing, or potentially disruptive.
Set output to json or table intentionally so the result can be reviewed or saved as evidence.

What output tells you

Resource identity and scope show whether you are inspecting the intended each AI service resource.
Configuration values reveal the current state of AI service diagnostic settings before you change it.
Operational signals such as enabled categories, destination IDs, retention plan, workspace table names, ingestion latency, and policy compliance help confirm whether the design is healthy.
Errors usually point to the wrong subscription, insufficient RBAC, a disabled provider, missing extension, stale credentials, or network restrictions.

Mapped Azure CLI commands

Inspect and operate AI service diagnostic settings

diagnostic

az monitor diagnostic-settings list --resource <ai-resource-id>

az monitor diagnostic-settingsdiscoverAI and Machine Learning

az monitor diagnostic-settings create --resource <ai-resource-id> --name ai-logs --workspace <workspace-id> --logs @logs.json

az monitor diagnostic-settingsremoveMonitoring and Observability

az monitor metrics list --resource <ai-resource-id> --output table

az monitor metricsdiscoverMonitoring and Observability

Architecture context

Security

Security for AI service diagnostic settings starts with the boundary it creates or exposes. Teams should route logs to approved destinations, protect prompt or request metadata, restrict workspace readers, and avoid leaking sensitive AI interactions through broad log access. Access should follow least privilege, be reviewed regularly, and be separated between production and nonproduction wherever the term controls traffic, credentials, policy, or AI behavior. Logging and ownership matter as much as initial configuration, because incidents often begin with a small setting nobody can explain. Before approving a change, verify who can read it, who can modify it, what data could be exposed, and whether Azure Policy, RBAC, private networking, or Key Vault should enforce the safer pattern.

Cost

Cost impact for AI service diagnostic settings may be direct or indirect, but it should still be explicit. The main cost concern is that log ingestion, retention, Storage archive, and Event Hub throughput can become meaningful when high-volume AI resources emit detailed logs. FinOps review should include the Azure resource that creates charges, the usage signal that predicts growth, and the person who owns the budget. Teams should check whether the term changes retention, throughput, node count, logging volume, private networking, model calls, or idle capacity. Even when the feature itself is free, the resources it enables can create meaningful monthly spend.

Reliability

Reliability for AI service diagnostic settings depends on whether the design keeps working during spikes, failures, upgrades, and routine change. The main reliability concern is that diagnostic settings give responders evidence when AI calls fail, throttling appears, keys rotate, or private endpoint DNS changes break clients. A good implementation includes documented defaults, health checks, rollback paths, and monitoring that shows whether expected behavior remains true. Teams should test the term under realistic load or failure conditions, not only in a quiet portal review. They should also understand which dependencies can break it, including region choice, identity, DNS, quota, node capacity, telemetry ingestion, or downstream service health.

Performance

Performance for AI service diagnostic settings is about how quickly and consistently the surrounding system responds. The main performance factor is that diagnostic settings usually do not improve AI latency, but they expose slow requests, throttling, and endpoint errors that hurt runtime performance. Teams should measure behavior with realistic inputs, dependency paths, and failure modes rather than assuming the default setting is enough. Useful checks include latency, throughput, queue depth, scale timing, DNS behavior, token volume, or controller reconciliation delay, depending on the term. If the term is mostly governance or configuration, it still affects operational performance by making diagnosis faster and reducing avoidable deployment mistakes.

Operations

Operationally, AI service diagnostic settings should be handled through a repeatable runbook rather than memory. Teams need to standardize logging categories, validate ingestion, review policy compliance, query errors, and clean stale settings after resource moves or renames. The runbook should show where to inspect the setting, what a healthy value looks like, which command or portal page provides evidence, and who approves changes. Operators should keep screenshots out of the critical path when CLI, SDK, or IaC output can provide better proof. For every production change, capture the before state, expected after state, validation command, owner, and rollback note. That makes handoffs cleaner when a different engineer responds at night.

Common mistakes

Treating AI service diagnostic settings as a portal label instead of an operational setting with ownership and evidence.
Changing production before checking subscription, region, identity, networking, and rollback impact.
Skipping monitoring or log validation, which leaves teams blind during incidents.
Using broad permissions or copied secrets when a narrower identity or Key Vault pattern would be safer.

Operator quick checks

Find the owning resource and confirm it matches the intended each AI service resource.
Check the current value or status for enabled categories, destination IDs, retention plan, workspace table names, ingestion latency, and policy compliance.
Verify that monitoring, diagnostic settings, or Kubernetes events can prove the behavior after change.
Confirm the rollback path and the person responsible for approval.

Questions to ask

What boundary does AI service diagnostic settings affect: identity, network, workload, data plane, quota, or observability?
What breaks if this value is wrong, missing, exhausted, rotated, or upgraded?
Which command, metric, log, or policy result proves the current state?
Who owns the production decision, and what evidence should be captured before and after the change?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph