AI and Machine Learning Azure AI Search premium

Enrichment cache

Enrichment cache is an Azure AI Search cache stored in Azure Storage that preserves document cracking and skill outputs for reuse across indexer runs. In plain English, it is the Azure concept teams use when they need to avoid full reprocessing when skillsets or downstream projections change, especially for expensive OCR, image analysis, and AI enrichment pipelines. It matters because the term connects the feature people discuss in meetings to the resource, setting, identity, or data object operators must actually check.

Aliases
AI Search enrichment cache, incremental enrichment cache, skillset cache
Difficulty
intermediate
CLI mappings
4
Last verified
2026-05-14

Microsoft Learn

An enrichment cache stores outputs from document cracking and skill execution in Azure Storage so AI enrichment pipelines can reuse existing processed content.

Microsoft Learn: Configure enrichment caching in Azure AI Search2026-05-14

Technical context

Technically, Enrichment cache appears in Azure AI Search skillsets, indexer configurations, cache storage accounts, debug sessions, knowledge store projections, and enrichment pipeline run history. Configuration usually centers on cache storage connection, skillset definition, indexer settings, change detection, reset behavior, storage container, and projection dependencies. It depends on Azure AI Search, skillsets, indexers, data sources, Azure Storage, built-in skills, custom skills, and diagnostic logging, so scope matters before any command, portal change, or deployment update. Operators validate live state by collecting cache container, skillset version, indexer status, document cracking outputs, skill outputs, reset history, storage usage, and failed enrichment records.

Why it matters

Enrichment cache matters because a shallow definition leads teams to troubleshoot the wrong layer, approve weak changes, or miss dependencies that explain the real incident. It affects search indexing cost control, AI enrichment iteration, document processing reliability, and faster development of knowledge extraction pipelines, which means architecture, security, operations, finance, and application teams all need the same vocabulary. When the term is documented well, operators know the exact scope, owner, identity, metric, log, and rollback evidence to inspect before they scale, rotate, reindex, deploy, grant access, or escalate. That shared evidence reduces incident time, improves audits, and makes production change safer.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In an Azure AI Search skillset pipeline, enrichment cache appears when skill outputs are reused instead of fully recomputed after downstream changes during production review and support triage.

Signal 02

In Azure Storage, it appears as cached cracked documents and skill outputs that increase storage usage but reduce repeated AI processing during production review and support triage.

Signal 03

In indexer troubleshooting, it appears when reset choices, skillset edits, cache invalidation, or stale enrichments explain unexpected pipeline behavior during production review and support triage.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Reuse OCR and AI skill outputs while changing downstream index mappings.
  • Reduce cost during iterative Azure AI Search enrichment development.
  • Troubleshoot whether stale or invalidated cache content affected an indexer run.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Enrichment cache in action for public records

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Maple Archives, a public records organization, needed to iterate on OCR enrichment for scanned land records without paying for full reprocessing every change.

Business/Technical Objectives
  • Reduce repeated OCR processing cost
  • Shorten enrichment development cycles
  • Preserve source document lineage
  • Improve search quality before launch
Solution Using Enrichment cache

The search team enabled enrichment caching for the Azure AI Search skillset that cracked scanned PDFs and applied OCR, key phrase extraction, and custom normalization. Cached outputs were stored in an approved storage account with restricted access. Developers adjusted downstream mappings and projections without rerunning every expensive skill. Indexer run history, cache storage usage, and quality samples were reviewed before the public search portal launched. The team validated Enrichment cache in a lower environment, captured before-and-after evidence, and promoted the change through controlled release gates. Runbooks were updated so support engineers could find the correct scope, identity, dependency, telemetry signal, and approval record without relying on the original implementer. The final design connected governance with daily engineering work, making the change understandable to security, operations, finance, and application stakeholders.

Results & Business Impact
  • Development reprocessing cost dropped by 52 percent
  • Indexer test cycles decreased from hours to minutes
  • Source document lineage stayed attached to each enrichment
  • Launch search quality passed records-office review
Key Takeaway for Glossary Readers

Enrichment cache saves money and time when AI enrichment changes happen after expensive document processing.

Case study 02

Enrichment cache in action for insurance

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

SummitMed Claims, a insurance organization, wanted to add new search fields to claim-document enrichment without rerunning image analysis on every document.

Business/Technical Objectives
  • Add fields to the claim search index
  • Avoid repeated image-analysis charges
  • Keep sensitive claim outputs protected
  • Maintain indexer recovery evidence
Solution Using Enrichment cache

The architecture team configured enrichment cache for the claim-document skillset and stored cached outputs in a private storage account. When business analysts requested new searchable fields, engineers reused cached document cracking and image-analysis outputs while adjusting output mappings. Access to the cache storage was limited to the search service identity and platform administrators. Runbooks explained when to reset the indexer, when to reset documents, and when to preserve cached outputs. The team validated Enrichment cache in a lower environment, captured before-and-after evidence, and promoted the change through controlled release gates. Runbooks were updated so support engineers could find the correct scope, identity, dependency, telemetry signal, and approval record without relying on the original implementer. The final design connected governance with daily engineering work, making the change understandable to security, operations, finance, and application stakeholders.

Results & Business Impact
  • Image-analysis reprocessing was avoided for most documents
  • Field-change turnaround dropped by 40 percent
  • Cache storage access passed security review
  • Indexer reset decisions were documented in change tickets
Key Takeaway for Glossary Readers

Enrichment cache makes iterative search improvements safer when enrichment output is sensitive and expensive.

Case study 03

Enrichment cache in action for manufacturing

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

HarborTech Manufacturing, a manufacturing organization, needed to troubleshoot stale extracted values in a parts knowledge index after a custom skill update.

Business/Technical Objectives
  • Identify whether cached enrichments were stale
  • Fix extraction without a full rebuild
  • Preserve production search availability
  • Document reset decisions
Solution Using Enrichment cache

Engineers compared skillset JSON, indexer status, cache storage timestamps, and sample enriched documents. They found that downstream mapping changes reused cached output correctly, but the custom skill change required selected document reset. The team reset only affected documents, monitored indexer status, and validated extracted part numbers against known samples. The incident report included cache behavior, reset scope, and the operational rule for future custom skill updates. The team validated Enrichment cache in a lower environment, captured before-and-after evidence, and promoted the change through controlled release gates. Runbooks were updated so support engineers could find the correct scope, identity, dependency, telemetry signal, and approval record without relying on the original implementer. The final design connected governance with daily engineering work, making the change understandable to security, operations, finance, and application stakeholders.

Results & Business Impact
  • Incorrect part numbers were fixed without a full rebuild
  • Search availability remained within target
  • Troubleshooting time dropped after cache evidence was documented
  • Future skill changes received a clear reset checklist
Key Takeaway for Glossary Readers

Enrichment cache is useful only when operators understand what it preserves and when reset is required.

Why use Azure CLI for this?

CLI checks for Enrichment cache turn portal assumptions into repeatable evidence. Start with read-only commands that show scope, identity, deployment, endpoint, storage, policy, status, metrics, or access relationships. Attach sanitized output to incidents and change tickets. Run mutating, security-impacting, or cost-impacting commands only after approval because the wrong tenant, subscription, resource, deployment, or policy can affect customers and downstream teams.

CLI use cases

  • Confirm the live Azure scope and current configuration for Enrichment cache before a production change.
  • Collect troubleshooting evidence for incidents involving Enrichment cache without relying on screenshots alone.
  • Compare CLI or REST output with source-controlled intent, dashboards, graph connections, and owner runbooks.

Before you run CLI

  • Run az account show and confirm the tenant, subscription, resource group, and environment before collecting evidence or changing anything.
  • Use read-only commands first, save sanitized JSON output, and compare live state with source control, tickets, and approved architecture notes.
  • Confirm owner, data classification, private connectivity, identity, monitoring destination, cost center, and rollback path before production changes.

What output tells you

  • Whether the exact resource, deployment, identity, endpoint, cache, scope, domain, or access package exists in the expected Azure boundary.
  • Which configuration values, linked dependencies, identity settings, networking controls, status fields, and timestamps explain current production behavior.
  • Whether the next action is a safe read, a configuration fix, a rollout adjustment, an access review, a cost review, or escalation to service owners.

Mapped Azure CLI commands

Enrichment cache operational checks

direct
az search service show --name <search-service> --resource-group <resource-group>
az search servicediscoverAI and Machine Learning
az storage account show --name <storage-account> --resource-group <resource-group>
az storage accountdiscoverStorage
az rest --method get --url "https://<search-service>.search.windows.net/skillsets/<skillset-name>?api-version=2025-09-01"
az restdiscoverAI and Machine Learning
az rest --method get --url "https://<search-service>.search.windows.net/indexers/<indexer-name>/status?api-version=2025-09-01"
az restdiscoverAI and Machine Learning

Architecture context

Enrichment cache belongs to AI and Machine Learning architecture where identity, monitoring, cost ownership, reliability, and support need shared evidence.

Security

Security for Enrichment cache starts with least privilege and clear evidence about who can configure, view, operate, or misuse it. Review storage account access, managed identity or connection strings, sensitive enriched output, private endpoints, admin key access, and retention policy before production approval. A common mistake is assuming that a successful deployment, healthy metric, or working application proves the configuration is safe. Use managed identity where possible, protect secrets and keys, prefer private connectivity for sensitive paths, restrict logs that contain business data, and keep exceptions ticketed and time-bounded. For regulated workloads, connect the term to classification, retention, break-glass access, and incident-response procedures.

Cost

Cost for Enrichment cache includes more than the visible Azure meter. Review AI skill execution cost, OCR reprocessing, storage charges, duplicate cache copies, indexer retries, diagnostic ingestion, and unnecessary full rebuilds because weak design often creates hidden spend through repeated processing, failed retries, over-provisioned capacity, unused assignments, support labor, audit cleanup, or extra storage. Tag ownership, environment, application, and cost center so charges can be explained. Compare actual use with purchased capacity, retention, token volume, request count, and operational value. Do not scale or rebuild blindly before checking configuration mistakes, retry loops, stale data, access errors, and monitoring evidence. This keeps architecture, security, support, and finance teams working from the same production evidence.

Reliability

Reliability for Enrichment cache depends on known limits, tested dependencies, and recovery procedures that operators can run without guessing. Review cache availability, indexer reset decisions, skillset change impact, storage durability, partial failure handling, and recovery after data-source changes before depending on it for a customer-facing workflow. The important question is how it behaves during retries, scale events, region issues, model changes, key rotation, index rebuilds, approval delays, or operator mistakes. Capture baseline metrics, expected states, and failure modes before change. Alert on symptoms that prove user impact, not just configuration drift, and keep rollback steps visible in the runbook. This keeps architecture, security, support, and finance teams working from the same production evidence.

Performance

Performance for Enrichment cache depends on workload shape, platform limits, dependency health, and how evidence is interpreted. Review indexer run duration, cache hit rate, storage latency, skill execution time, document volume, custom skill response time, and reset scope before blaming the service or adding capacity. Look for saturation, throttling, queueing, cold starts, slow dependencies, stale indexes, oversized payloads, weak filters, or inefficient application behavior. Measure before and after any change and keep baselines for normal, peak, and incident conditions. For shared services, identify noisy neighbors and per-resource limits. Performance tuning should not create new security gaps, reliability risk, or unexpected cost.

Operations

Operations for Enrichment cache should be repeatable enough that a different engineer can collect the same evidence and reach the same conclusion. Review cache lifecycle reviews, indexer runbooks, storage monitoring, skillset change notes, debug sessions, reset approvals, and enrichment quality checks during change management, incident response, onboarding, and access reviews. Start with read-only checks, confirm tenant and subscription context, and attach sanitized CLI, REST, log, or metric output to the ticket. Keep names, tags, owners, dashboards, runbooks, and graph connections current. After every change, verify expected behavior and record any exception so future operators know what breaks first. This keeps architecture, security, support, and finance teams working from the same production evidence.

Common mistakes

  • Assuming a matching display name proves the correct resource when tenants, subscriptions, regions, and service instances can contain similar names.
  • Running mutating commands before capturing read-only evidence, owner approval, monitoring baselines, rollback steps, and expected post-change signals.
  • Treating a portal screenshot as complete proof when CLI output, REST responses, Activity Logs, metrics, and source-controlled definitions are more repeatable.