AI and Machine LearningAzure AI Searchpremiumpremiumfield-manual-template-specs
Search index projection
A search index projection tells Azure AI Search how to turn enriched source content into index documents. Instead of pushing one large file into one search record, a projection can create many smaller child records, such as chunks from a PDF, each carrying parent metadata. This is especially useful for RAG because the search index needs retrievable passages, stable keys, and context fields, not just the original blob or database row. The result is a search shape built for retrieval rather than storage.
Azure AI Search index projection, indexer projection, projected search index, RAG chunk projection, skillset index projection, index projections, projection selector, projected index, chunk projection
Difficulty
fundamentals
CLI mappings
5
Last verified
2026-05-23
Microsoft Learn
A search index projection defines how enriched content from an Azure AI Search skillset is projected into one or more indexes. It is commonly used to create chunked child documents with parent metadata for retrieval-augmented generation. It is configured as part of the enrichment pipeline.
In Azure architecture, index projections sit inside the indexer and skillset enrichment pipeline. The data source supplies raw content, skills extract or transform it, and the projection maps enriched nodes into fields in a target index. It is a data-plane ingestion design feature, not a search service scaling setting. Projections connect blob storage, skillsets, knowledge enrichment, index schemas, chunking strategies, parent-child metadata, and downstream retrieval patterns used by applications or AI orchestration layers. Operators should validate Search index projection evidence before and after production changes.
Why it matters
Search index projections matter because RAG quality often fails at the ingestion shape, not at the model. If a whole document becomes one record, retrieval returns too much noise. If chunks lose title, source, security, or parent identifiers, users cannot trust or cite the answer. A projection gives teams a controlled way to create searchable child documents while preserving the context needed for filters, citations, and troubleshooting. It also reduces brittle custom ingestion code because the mapping is part of the Azure AI Search enrichment pipeline. That mapping is often the hidden difference between cited answers and vague summaries. It also gives reviewers a concrete artifact to test after release.
⌁
Where you see it
Signals, screens, and Azure surfaces where this term usually becomes operational.
Signal 01
In a skillset definition, indexProjections or projection selectors map enriched document paths to target index fields and parent metadata. during search index projection production support reviews.
Signal 02
In indexer execution history, projection mistakes appear as enrichment warnings, failed field mappings, null keys, or unexpected document counts after a run. during search index projection production support reviews.
Signal 03
In the target index, projected child documents show chunk text, parent IDs, source paths, titles, ACL fields, and citation metadata returned by queries. during search index projection production support reviews.
Signal 04
In RAG answer traces, retrieved chunk IDs and source links reveal whether projections preserved enough parent context for grounding and citations. during search index projection production support reviews.
✦
When this becomes relevant
Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.
Create chunk-level RAG documents from PDFs while preserving title, source path, tenant, and citation metadata.
Project enriched OCR or entity extraction output into a target index without writing a custom ingestion service.
Maintain parent-child relationships so search results return precise passages while applications still link to the original file.
Troubleshoot duplicated or missing chunks by comparing source documents, projection mappings, and target index counts.
Control cost by projecting only fields needed for retrieval, filters, citations, and downstream model grounding.
◆
Real-world case studies
Different enterprise-style examples that show the term being used to hit measurable objectives.
Case study 01
Legal operations turns contracts into cited RAG chunks
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
A corporate legal team wanted AI search across supplier contracts, but full-document retrieval returned long excerpts without section-level citations.
🎯Business/Technical Objectives
Create chunk-level retrieval for contract questions.
Preserve contract ID, clause title, jurisdiction, and confidentiality level.
Reduce hallucinated or uncited legal assistant answers.
Avoid building a custom chunking ingestion service.
✅Solution Using Search index projection
The search architect configured an enrichment pipeline that split contract PDFs into clause-sized chunks and used a Search index projection to map each chunk into a target index. Every projected document carried parent contract ID, supplier name, clause heading, jurisdiction, confidentiality label, and source path. The team stored only approved clause text as retrievable content, while filters used confidentiality and legal entity fields. Indexer status checks compared expected clauses against projected child documents, and RAG traces recorded chunk IDs for citation review. A small pilot set was rebuilt repeatedly until keys remained stable across reruns.
📈Results & Business Impact
Cited answers in legal-assistant testing improved from 58 percent to 91 percent.
Average contract research time fell from 37 minutes to 16 minutes.
Confidential clause leakage findings were reduced to zero in prelaunch review.
The team avoided roughly eight weeks of custom ingestion development.
💡Key Takeaway for Glossary Readers
Search index projection is valuable when the answer needs a precise chunk but the business still needs parent-document context.
Case study 02
Energy inspections preserve source context from image-heavy reports
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
An energy utility indexed field inspection reports with photos and OCR text, but technicians could not tell which asset or inspection generated a retrieved passage.
🎯Business/Technical Objectives
Project OCR-enriched passages into searchable child records.
Keep asset ID, site, inspection date, and image reference with every chunk.
Improve retrieval for maintenance-planning questions.
Reduce duplicate passages after repeated inspections.
✅Solution Using Search index projection
The data team used a skillset to extract text from inspection PDFs and photo captions, then defined a Search index projection into a maintenance index. The projection mapped each enriched passage to fields for asset ID, substation, inspection date, defect category, image URI, and parent report ID. Key generation combined report ID and passage ordinal so reruns updated records instead of creating duplicates. Operators reviewed indexer warnings for failed OCR and compared projected chunk counts against known reports. The maintenance planner then searched defect descriptions while filtering by asset class and region.
📈Results & Business Impact
Duplicate projected passages fell by 73 percent after stable parent-based keys were introduced.
Maintenance planners found relevant inspection evidence 44 percent faster in acceptance testing.
OCR failure triage time dropped from two days to three hours using indexer status evidence.
Every retrieved passage linked back to the original report and photo reference.
💡Key Takeaway for Glossary Readers
A projection keeps enriched child content operationally useful by carrying the asset and source metadata technicians need.
Case study 03
Education publisher maps chapters into lesson-level search
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
An education publisher stored textbooks as large files, making curriculum search return broad chapters instead of lesson-level explanations for teachers.
🎯Business/Technical Objectives
Create lesson-sized search records from textbook chapters.
Preserve grade, subject, standard, book, and page metadata.
Support teacher-facing search and AI lesson planning.
Keep rebuilds predictable during annual curriculum updates.
✅Solution Using Search index projection
The platform team split textbook chapters into lesson sections during enrichment and used a Search index projection to populate a lesson index. Each child document contained section text, standard code, grade, subject, book edition, page range, and parent chapter ID. The team kept long teacher notes out of retrievable fields and stored edition status for filters. During annual updates, the indexer projected new edition chunks into a parallel index, where saved teacher queries compared old and new results. Projection mappings were versioned with the skillset and target schema.
📈Results & Business Impact
Teacher search satisfaction rose from 3.1 to 4.4 out of 5 in pilot feedback.
AI lesson-planning prompt size dropped 31 percent due to focused lesson chunks.
Annual curriculum search validation time fell from three weeks to nine days.
Obsolete edition content stopped appearing after status filters were projected onto each chunk.
💡Key Takeaway for Glossary Readers
Search index projection helps teams align educational content retrieval with how users actually teach and plan.
Why use Azure CLI for this?
For index projections, Azure CLI is the control-room tool around a data-plane enrichment design. I use it to confirm the search service, identity, data-source access, diagnostics, and endpoint before testing skillset or indexer JSON. The actual projection definition is usually managed with REST, SDK, or IaC, so az rest becomes the useful bridge. CLI-based evidence is valuable because projection bugs often look like model failures. In reality, the wrong service, stale skillset, disabled identity, or missing diagnostics can break the projected index before retrieval ever begins. It keeps the investigation anchored to the actual service and enrichment assets. That evidence makes handoff between platform and application teams cleaner.
CLI use cases
Confirm the target service and endpoint before fetching skillset, indexer, and index definitions for projection review.
Use az rest to inspect the skillset projection section and verify mappings to the intended index fields.
Retrieve indexer status after a run to find projection-related warnings, failed mappings, or skipped documents.
Fetch sample projected documents and confirm parent IDs, chunk text, source links, and ACL metadata are present.
Export service diagnostics and managed identity settings when projection failures trace back to source access.
Before you run CLI
Confirm tenant, subscription, resource group, search service, skillset, indexer, target index, region, and API version.
Use read-only GET requests first; projection changes can reshape documents and require a full index rebuild.
Check permissions for search administration, source storage access, managed identity, private endpoints, and output redaction.
Estimate cost risk before rerunning enrichment across large files, vectorized chunks, or skillsets with billable skills.
What output tells you
Skillset output shows projection selectors, source context paths, target index names, parent key fields, and field mappings.
Indexer status shows last run time, execution status, item failures, warnings, document counts, and whether enrichment completed.
Index schema output confirms projected fields exist and have the correct searchable, filterable, retrievable, and vector attributes.
Sample document output proves each chunk contains source metadata, parent identifiers, text length, security fields, and citation values.
Mapped Azure CLI commands
Search index projection operational checks
direct
az search service show --name <search-service> --resource-group <resource-group>
az search servicediscoverAI and Machine Learning
az rest --method get --url "https://<search-service>.search.windows.net/skillsets/<skillset-name>?api-version=2024-07-01" --headers "api-key=<admin-key>"
az restdiscoverAI and Machine Learning
az rest --method get --url "https://<search-service>.search.windows.net/indexers/<indexer-name>/status?api-version=2024-07-01" --headers "api-key=<admin-key>"
az restdiscoverAI and Machine Learning
az rest --method get --url "https://<search-service>.search.windows.net/indexes/<target-index>?api-version=2024-07-01" --headers "api-key=<admin-key>"
az restdiscoverAI and Machine Learning
az rest --method post --url "https://<search-service>.search.windows.net/indexes/<target-index>/docs/search?api-version=2024-07-01" --headers "api-key=<query-or-admin-key>" --body @sample-projected-query.json
az restdiscoverAI and Machine Learning
Architecture context
As an architect, I use index projections when the source document shape and the retrieval shape are different. Blob files, PDFs, forms, and knowledge articles often need extraction, splitting, enrichment, and then projection into a purpose-built index. The projection must align with the target schema, chunking skill, parent key, security metadata, and update pattern. It is especially important when the application needs citations, source links, or parent-level filters. Done well, projections make indexing repeatable. Done poorly, they create orphan chunks, duplicated passages, and grounding content without provenance. That ownership line matters when ingestion, search, and application teams share responsibility. That context should be clear before production traffic depends on it.
Security
Security impact is direct when projections copy parent metadata, ACLs, tenant IDs, or classification labels into child documents. If that metadata is omitted, a RAG app can retrieve a chunk that no longer carries the security context of its source file. The projection should preserve fields needed for filters and citations while avoiding unnecessary sensitive content in retrievable fields. Admin keys, data-source credentials, managed identity, and private connectivity still protect the pipeline boundary. The design question is simple: can every projected chunk still prove who may see it? Reviewers should test at least one restricted source document end to end.
Cost
A projection has no independent meter, but it changes storage, indexing, enrichment, and AI costs. Splitting documents into many child records increases index document count, storage, and rebuild time. Enrichment skills, vectorization, and semantic ranking can add cost when applied to every chunk. The upside is lower prompt cost and better answer quality because retrieval returns focused passages instead of entire files. FinOps review should track average chunks per source, vector dimensions, storage growth, rebuild frequency, and whether projected fields are truly needed by the application. A small projection mistake can multiply storage and enrichment work across every source file.
Reliability
Reliability depends on whether projections produce stable, repeatable documents during reindexing. Bad parent keys, changing chunk IDs, or incomplete projection mappings can create duplicates, orphan child records, or missing passages after an indexer run. Operators should validate the number of projected chunks, parent identifiers, and projection selectors after every pipeline change. Keep sample source files for regression testing, and plan rebuilds carefully because projection changes usually affect document identity and retrieval behavior. Users experience projection failures as missing citations or inconsistent AI answers. Stable projection tests should run before every major skillset or chunking change. Operators should record baseline behavior and expected recovery evidence.
Performance
Performance impact appears in both ingestion and retrieval. During indexing, projections add mapping work and can multiply document volume. During search, projected chunks usually improve retrieval precision because smaller records match more specific user questions. Too many tiny chunks can increase result noise, while oversized chunks can slow prompts and weaken grounding. The right projection balances chunk size, parent context, vector fields, filters, and citation needs. Measure indexing duration, chunk count, query latency, first-page relevance, and answer quality before declaring the projection successful. Teams should test projected retrieval with real prompts, not only indexer completion status. Performance review should include p95 behavior and worst-case user journeys.
Operations
Operators inspect index projections through skillset JSON, indexer status, target index fields, document counts, and sample projected documents. They troubleshoot by reviewing enrichment errors, failed field mappings, null parent IDs, unexpected chunk counts, and stale documents after source changes. The practical runbook compares a known source file with its projected child records: count, key, title, path, ACL metadata, and text length. CLI helps verify the search service and run REST calls, while detailed projection editing is usually REST, SDK, or infrastructure-as-code work. Operators should archive sample source files and expected projected documents for regression checks. The support path should include owner, evidence, and next action.
Common mistakes
Projecting chunk text without parent IDs or source paths, leaving RAG answers without trustworthy citations.
Changing chunking or key generation and creating duplicate child documents instead of stable updates.
Forgetting tenant, ACL, or classification metadata so child chunks cannot be safely filtered later.
Assuming projection problems are model-quality issues before checking indexer warnings and target document counts.