AI and Machine Learning Azure AI Search premium premium field-manual-template-specs

Search data source

A search data source tells an Azure AI Search indexer where to pull content from. It might point to Azure Blob Storage, Azure SQL Database, Cosmos DB, or another supported source. The data source is not the search index itself; it is the connection description the indexer uses before it maps content into searchable documents. Getting it right means the search service can crawl the right container, table, view, or collection with the right credentials and change-tracking behavior.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure AI Search data source, indexer data source, SearchIndexerDataSourceConnection, data source connection, Azure Search data source object
Difficulty: fundamentals
CLI mappings: 5
Last verified: 2026-05-23

Microsoft Learn

A search data source is an Azure AI Search indexer object that describes where external content comes from. It specifies the supported source type, connection information, credentials or identity, container or query, and sometimes change or deletion detection used by indexers.

Microsoft Learn: Indexer overview in Azure AI Search2026-05-23

Technical context

In Azure architecture, a search data source sits in the Azure AI Search data plane as part of the indexer pipeline. It connects the search service to external data, while the index defines the target schema and the indexer defines the crawl job. The data source can include connection strings, managed identity settings, containers, SQL queries, and change or deletion detection policies. Its security and networking depend on the external store, the search service identity, private connectivity, firewall rules, and indexer execution environment.

Why it matters

Search data sources matter because ingestion failures usually start at the connection boundary. A wrong container, expired secret, missing managed identity role, blocked firewall, or unsupported change policy can leave the search index stale while the application still appears healthy. The data source also defines what the search service is allowed to read, so it affects security, freshness, and compliance. Good teams treat data source objects as production integration points, not wizard leftovers. They document ownership, credentials, change tracking, and recovery steps because search relevance is useless when ingestion silently stops. It is the place to check before blaming ranking or application code.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal Data sources tab, each object shows its source type, name, connection target, and whether an indexer can use it during ingestion setup reviews.

Signal 02

In indexer JSON, the dataSourceName property links the crawl job to the data source object that supplies external content and credentials during failed ingestion troubleshooting sessions.

Signal 03

In indexer execution history, authentication failures, firewall blocks, unsupported change tracking, or missing containers often point back to data source configuration during source access investigations.

Signal 04

In migration inventories, data source definitions reveal storage accounts, databases, containers, collections, and tables that must be reachable by the new search service.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Connect an indexer to Blob Storage, Azure SQL, or Cosmos DB without writing a custom ingestion worker.
Migrate search ingestion by inventorying data source objects and rebuilding equivalent connections in a new service or region.
Switch from connection strings to managed identity for indexer access while keeping the same target index design.
Troubleshoot stale search results by proving whether the indexer can still read the configured source path.
Scope indexing to the right table, view, container, or collection so sensitive or irrelevant source data is not indexed.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Construction platform moves drawings from custom polling to indexers

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A construction SaaS platform used a custom worker to poll Blob Storage for project drawings, but missed updates caused crews to search outdated plans.

Business/Technical Objectives

Index changed drawings within 30 minutes of upload.
Remove the custom polling worker and its queue retry failures.
Limit indexing to approved project containers only.
Keep project-level security filters unchanged in the application.

Solution Using Search data source

The team created Azure AI Search data source objects for approved Blob Storage containers and connected them to indexers that populated the drawing index. Each data source used a managed identity with scoped storage access rather than a shared account key. The indexer schedule replaced the polling worker, while field mappings carried project ID, revision, and approval status into the index. Azure CLI confirmed the search service identity, private endpoint, diagnostic settings, and storage role assignments before REST calls exported the data source definitions. Operators monitored indexer run history and compared uploaded drawing timestamps with searchable document timestamps.

Results & Business Impact

Median time from drawing upload to searchable result fell from four hours to 18 minutes.
The custom polling worker and retry queue were retired, saving about $1,100 monthly.
Search incidents caused by missed plan revisions dropped from 17 per month to two.
Storage access was reduced from account-wide keys to scoped managed identity roles.

Key Takeaway for Glossary Readers

A search data source makes ingestion a managed connection that can be secured, monitored, and reused instead of hidden in polling code.

Case study 02

University research portal consolidates grant databases

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A university research office indexed grant opportunities from three Azure SQL databases, but each department used different connection strings and stale views.

Business/Technical Objectives

Standardize ingestion sources for all grant search indexes.
Reduce stale grant records visible to faculty.
Move database access from passwords to managed identity.
Document source ownership for audits and department handoffs.

Solution Using Search data source

The architecture team created separate search data source objects for each Azure SQL database and pointed them at curated views owned by the research office. They enabled managed identity access, granted the search service only the required database roles, and configured indexers with consistent field mappings into a shared grant index. Azure CLI verified the search service identity and resource context, while database administrators confirmed firewall and role assignments. The team used REST to export the data source definitions into the release repository, excluding secrets. Runbooks listed each source view owner and the expected update frequency.

Results & Business Impact

Stale grant records older than their source view dropped by 83 percent.
Password-based database connections for search ingestion were eliminated.
Faculty complaints about missing opportunities fell from 38 to nine per semester.
Audit preparation for source ownership dropped from one week to one business day.

Key Takeaway for Glossary Readers

Search data sources give teams a clear ingestion inventory when search spans several upstream systems.

Case study 03

Energy utility rebuilds search ingestion after regional migration

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An energy utility moved its outage knowledge search to a new region, but the first migration copied indexes without the data source objects that kept them fresh.

Business/Technical Objectives

Recreate every data source and indexer dependency in the target region.
Avoid indexing restricted outage documents outside approved repositories.
Verify that updates, deletes, and new articles reached the rebuilt index.
Shorten recovery time if the migration cutover failed.

Solution Using Search data source

The migration team inventoried existing Azure AI Search data sources, including Blob Storage containers for field manuals and an Azure SQL view for outage procedures. They rebuilt equivalent data source objects in the new search service, using private connectivity and managed identity assignments approved by the security team. Indexers were created only after source firewalls and role assignments were validated. Azure CLI captured the new service identity, region, private endpoint connections, and diagnostic settings. REST exports compared old and new data source names, source types, containers, and change policies. A cutover checklist tested create, update, and delete propagation before applications switched endpoints.

Results & Business Impact

All 12 data source dependencies were recreated before the production cutover.
Post-cutover freshness checks passed for new, updated, and deleted documents within 45 minutes.
Restricted repositories were excluded from the target configuration, closing a migration security risk.
Rollback time for ingestion configuration fell from an estimated six hours to 55 minutes.

Key Takeaway for Glossary Readers

Search migrations are not complete until data source objects and their upstream access paths move with the index.

Why use Azure CLI for this?

For search data sources, Azure CLI is useful for the surrounding evidence, not for pretending every data-plane object has a neat command. I use CLI to confirm the search service, identity, networking, diagnostics, and resource ownership before inspecting the data source through REST or an SDK. That matters because ingestion failures often come from Azure boundaries: wrong subscription, missing role assignment, blocked private endpoint, stale secret, or source firewall. A portal screenshot rarely tells the whole story. CLI lets operators script the inventory, export service context, and then run controlled REST checks against the data source definition. That evidence keeps ingestion troubleshooting grounded in actual Azure boundaries.

CLI use cases

Show the search service identity and endpoint before assigning source-system read permissions for indexer access.
List private endpoint connections and public network access before troubleshooting blocked indexer reads.
Retrieve an admin key or token for a controlled REST export of data source definitions.
Compare data source JSON across environments to find drift in source type, container, credentials, or change detection.
Export diagnostic settings so indexer authentication and network failures are captured during ingestion troubleshooting.

Before you run CLI

Confirm tenant, subscription, resource group, search service, source resource, region, identity, and whether the target is production.
Verify Microsoft.Search provider registration and the source provider permissions needed to read storage, database, or Cosmos DB configuration.
Avoid printing connection strings or admin keys into logs; use secure output handling and sanitize exported data source JSON.
Check whether changing a data source could trigger full reindexing, source-system load, request-unit consumption, or compliance review.

What output tells you

Search service output confirms identity, endpoint, location, SKU, private networking, and public network access before source connectivity work begins.
Data source JSON shows source type, container or query, credentials mode, change detection policy, deletion detection policy, and object name.
Indexer status output shows whether the data source was reachable and which errors blocked crawling, document cracking, or field mapping.
Source resource output confirms firewall, role assignment, private endpoint, or connection settings that must allow the search service to read data.

Mapped Azure CLI commands

Azure AI Search operations

direct

az search service list --resource-group <resource-group>

az search servicediscoverAI and Machine Learning

az search service show --name <search-service> --resource-group <resource-group>

az search servicediscoverAI and Machine Learning

az search service create --name <search-service> --resource-group <resource-group> --sku basic

az search serviceprovisionAI and Machine Learning

az search admin-key show --service-name <search-service> --resource-group <resource-group>

az search admin-keydiscoverAI and Machine Learning

az search service delete --name <search-service> --resource-group <resource-group>

az search serviceremoveAI and Machine Learning

Architecture context

Architecturally, a search data source is the upstream edge of an Azure AI Search indexing design. I map it beside the source system, identity path, network path, indexer schedule, field mappings, skillsets, and target index. One data source can be reused by multiple indexers, but that convenience must be balanced against blast radius and ownership. The best designs prefer managed identity where supported, private network paths where required, and explicit change tracking for databases. When teams migrate search platforms, data source definitions become migration inventory because they reveal which external systems the search service really depends on. It deserves the same review as any other production integration.

Security

Security impact is direct because the data source controls how Azure AI Search reaches external data. Secrets in connection strings, managed identity role assignments, database permissions, storage firewall rules, private endpoints, and trusted service settings all determine exposure. A data source should grant only the read access the indexer needs, not broad write or account-wide permissions. Sensitive content should be filtered or scoped before indexing, because indexed documents may be easier to query than the original store. Operators must also protect data source definitions because they can reveal storage account names, database paths, or connection patterns. Security owners should approve both the source scope and credential path.

Cost

A search data source is not billed as a separate object, but it drives several cost paths. Indexer runs can read large volumes from storage, databases, or Cosmos DB, which can increase transaction, compute, or request-unit consumption on the source side. Frequent schedules, full reloads, and poor change tracking create unnecessary work. Private networking, diagnostics, and parallel rebuilds can add cost during migrations. The positive side is that a well-scoped data source reduces custom ingestion code and avoids always-on workers that poll external systems just to keep a search index current. FinOps owners should review source reads, schedules, and rebuild patterns together.

Reliability

Reliability impact is direct for freshness and ingestion continuity. If a search data source points to the wrong path, loses credentials, or cannot pass network controls, the indexer may fail or keep serving stale content. Change detection and deletion detection are reliability features because they help the index follow source updates without full reloads. Operators should monitor indexer execution history, failed item counts, high-water marks, and source availability. Recovery plans should include reauthenticating the source, rerunning indexers, rebuilding affected indexes, and confirming the application is not using stale search results. Freshness tests should be part of every source-side change window.

Performance

Performance impact appears in indexing speed, freshness, and source-system pressure. A data source that reads too broadly can slow indexer runs, load databases, consume storage transactions, or delay updates reaching users. Network path, identity checks, document size, field mappings, and change tracking all influence throughput. Search query latency is usually affected later, after documents land in the index, but stale or partial ingestion hurts perceived performance because users cannot find current content. Measure indexer duration, failed item counts, source throttling, and time from source update to searchable document. Indexing tests should measure source load as well as search-side progress. Operators should verify Search data source evidence after the change.

Operations

Operators manage search data sources by inventorying source names, source types, connection methods, identity assignments, network dependencies, and the indexers that consume them. They review data source definitions before migrations, rotate secrets, validate managed identity access, and check indexer run history after source-system changes. Azure CLI helps confirm the search service resource, identity, private endpoint, and diagnostics, while REST or SDK calls inspect the data source object itself. Good runbooks also include source owner contacts, change-tracking settings, and sample documents used to prove ingestion still works. Runbooks should name the upstream owner and expected recovery contact. Operators should verify Search data source evidence after the change.

Common mistakes

Creating a data source through the import wizard and never documenting which indexers depend on it.
Granting broad account-level secrets when a managed identity or narrower source permission would be safer.
Assuming query freshness problems are ranking issues before checking whether the indexer can still read the data source.
Migrating indexes but forgetting to recreate data source objects, change tracking policies, and source firewall allowances.

Operator quick checks

List every indexer that references the data source before editing or deleting it.
Verify the source resource allows the search service identity or configured connection string to read the intended path.
Check the latest indexer run for failed items, authentication errors, throttling, and stale high-water marks.
Confirm the data source scope excludes sensitive containers, tables, collections, or views that should not be searchable.

Questions to ask

What source system, path, and owner does this data source represent?
Which identity or secret lets the search service read it, and how is that access rotated or reviewed?
What breaks if the data source is unavailable, renamed, blocked by firewall, or changed by the source team?
How do we prove after a change that new, updated, and deleted source records are reflected in the index?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph