Analytics Synapse SQL and data virtualization premium

External data source

An External data source is a database object that defines the location and connection information used by SQL engines to access external data. Teams use it to point serverless SQL, dedicated SQL pools, or PolyBase-style queries to data stored outside the database, such as Azure Storage or Data Lake paths. It is not the external table schema, the file format definition, the storage account itself, or proof that the credential can read every file under the path.

Aliases
CREATE EXTERNAL DATA SOURCE, PolyBase external data source, Synapse external data source
Difficulty
intermediate
CLI mappings
6
Last verified
2026-05-14

Microsoft Learn

An External data source is a database object that defines the location and connection information used by SQL engines to access external data.

Microsoft Learn: CREATE EXTERNAL DATA SOURCE Transact-SQL2026-05-14

Technical context

Technically, the External data source is configured or observed through T-SQL CREATE EXTERNAL DATA SOURCE statements, database scoped credentials, storage URLs, ADLS Gen2 endpoints, Synapse SQL metadata, external tables, query errors, permissions, and workspace diagnostics. It depends on storage account access, managed identity or credential configuration, firewall rules, private endpoints, data lake paths, database permissions, external file format objects, external table definitions, and SQL engine support. Operators inspect it through the Azure portal, ARM or Bicep, Azure CLI, SDK or REST calls, Azure Monitor, diagnostic logs, and application telemetry.

Why it matters

External data source matters because it is the named connectivity layer that lets SQL query engines find external files or remote data without embedding locations throughout every query. Without clear vocabulary, teams may point to the wrong endpoint, use weak SAS tokens, expose broad storage paths, break external tables, or confuse SQL metadata success with actual file-level access. It also affects security, reliability, operations, cost, and performance because one configuration choice can change who can act, what fails, how quickly work completes, what evidence exists, and how much the platform costs. Good glossary discipline helps teams ask who owns it, what depends on it, which metric proves health, and what rollback path exists before a release.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

SQL metadata contains a CREATE EXTERNAL DATA SOURCE object with a storage or data virtualization location and optional database scoped credential. Review scope, owners, metrics, and rollback evidence.

Signal 02

External table queries fail with storage access, path, or credential errors even though the SQL table definition exists. Review scope, owners, metrics, and rollback evidence.

Signal 03

Architecture diagrams show Synapse or SQL querying ADLS Gen2 through named external data sources, external file formats, and external tables. Review scope, owners, metrics, and rollback evidence.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Configure SQL access to external lake data without hard-coding storage paths in every query.
  • Troubleshoot external table failures caused by identity, firewall, endpoint, or data source location mistakes.
  • Review data virtualization security before exposing external files through SQL schemas.
  • Support incident response by correlating Azure configuration, diagnostic logs, metrics, deployment history, and application traces.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

External data source in action for financial services

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

ClearStone Banking, a financial services organization, needed to solve a production challenge: risk analysts needed SQL access to curated lake files, but ad hoc OPENROWSET paths made permissions and query costs hard to control. The architecture team used External data source to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Standardize lake access through named objects
  • Limit analysts to curated zones
  • Reduce broad scans
  • Create auditable SQL metadata
Solution Using External data source

Data engineers created external data sources for approved ADLS Gen2 curated paths, paired them with database scoped credentials, and referenced them from external tables. Storage RBAC, firewall rules, and Synapse diagnostics were reviewed before analysts received access. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Unapproved path usage fell to zero
  • Query scan volume dropped 31 percent
  • Auditors could review data source definitions
  • Analysts used stable table names instead of raw paths
Key Takeaway for Glossary Readers

External data sources make lake access governable when identity and path boundaries are designed together.

Case study 02

External data source in action for manufacturing

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

GreenVale Foods, a manufacturing organization, needed to solve a production challenge: factory quality reports failed intermittently because SQL queries referenced retired storage endpoints after a lake migration. The architecture team used External data source to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Replace retired endpoints
  • Protect production reporting
  • Validate identity access
  • Improve migration evidence
Solution Using External data source

The team updated external data source objects to new DFS endpoints, verified managed identity access, and tested dependent external tables before switching reports. CLI checks confirmed storage accounts, private endpoint state, and Synapse workspace diagnostics. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Report failures stopped after cutover
  • Endpoint changes were visible in one metadata location
  • Identity issues were caught before release
  • Migration rollback was documented in the change record
Key Takeaway for Glossary Readers

A named external data source reduces migration risk because endpoint changes are centralized and testable.

Case study 03

External data source in action for public sector

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

CivicData Office, a public sector organization, needed to solve a production challenge: open-data files needed to be queried from serverless SQL while separating public datasets from restricted operational files. The architecture team used External data source to make the design measurable, governable, and easier to support.

Business/Technical Objectives
  • Separate public and restricted paths
  • Keep query access auditable
  • Avoid shared SAS sprawl
  • Support predictable open-data releases
Solution Using External data source

Architects created separate external data sources for public and restricted containers, used managed identities where possible, and attached external tables only to approved schemas. Diagnostic logs and role assignments were captured for release evidence. Before cutover, engineers captured read-only configuration, validated identity and network access, compared expected behavior with Azure Monitor or service logs, and stored rollback instructions in the change record. Operators received a runbook with first-response checks, known failure modes, owner contacts, and escalation paths. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state. The team also reviewed owner tags, diagnostic coverage, alert routing, and incident communication paths so support could confirm the workflow without changing production state.

Results & Business Impact
  • Public release queries stayed within approved containers
  • SAS token use was reduced substantially
  • Restricted files were not exposed through public schemas
  • Release checks became repeatable
Key Takeaway for Glossary Readers

External data sources help teams put a clear boundary between SQL metadata and real storage access.

Why use Azure CLI for this?

Azure CLI helps validate External data source because it captures reproducible evidence for scope, configuration, permissions, runtime state, diagnostics, and related resources before a production change.

CLI use cases

  • List or show Azure resources and related configuration for External data source.
  • Capture read-only evidence before changing identity, networking, triggers, capacity, policy, deployment, or automation settings.
  • Compare Azure metrics, logs, run history, deployment operations, and application evidence during production incidents.

Before you run CLI

  • Confirm the tenant, subscription, resource group, resource names, environment, and time window are the intended scope.
  • Run read-only list, show, metrics, operation, or query commands before any create, update, delete, start, stop, policy, or deployment change.
  • Get approval for mutating commands because configuration changes can expose data, break workflows, increase cost, or alter compliance evidence.

What output tells you

  • Resource IDs, enabled state, configuration values, identity settings, network posture, and ownership metadata show the current design.
  • Metrics, logs, run history, or deployment operations show whether the platform behaved as expected during the reviewed time window.
  • Application and downstream evidence shows whether the issue is Azure configuration, permissions, client behavior, data readiness, or business processing.

Mapped Azure CLI commands

Some evidence is visible only in service logs, SDK behavior, deployment output, SQL metadata, portal configuration, or application telemetry; Azure CLI still validates surrounding resources and operational scope.

Architecture context

An external data source sits between query metadata and the storage or service that actually holds the data, commonly in Synapse SQL, SQL Server-compatible external table patterns, and PolyBase-style access. Architecturally, I treat it as a boundary object: it defines where queries are allowed to reach, which credential or managed identity is used, and which network path must work. It is not the data itself and it is not a performance guarantee. Good designs pair the data source with private endpoints, scoped credentials, external file formats, and clear lake folder conventions. When this object is wrong, users often see query failures that look like SQL problems but are really identity, DNS, firewall, or storage path issues.

Security

Security for the External data source starts with knowing who can create database scoped credentials, view source definitions, grant external table access, manage storage roles, configure SAS tokens, and query sensitive files through SQL objects. Review data source location, credential name, storage endpoint type, identity permissions, firewall state, private DNS, external table references, query failures, and who owns the data lake path before approving production changes. Prefer managed identity and Microsoft Entra ID where the service supports it, keep secrets in approved vaults, scope roles narrowly, and protect diagnostics that may reveal sensitive names, payloads, or operational patterns. During audits, capture Activity Log entries, role assignments, network settings, diagnostic settings, and owner approvals so teams can prove access and behavior were intentional.

Cost

Cost for the External data source is driven by serverless query scans, failed retries, broad external tables, diagnostic logs, private networking, reprocessing after path mistakes, and support time spent tracing credentials and storage permissions. The expensive mistake is not only Azure consumption; it is also duplicate processing, failed retries, audit cleanup, manual investigations, and unnecessary capacity caused by weak design evidence. Review whether the workload truly needs the selected tier, frequency, retention, diagnostics, network path, and automation pattern. Use tags, budgets, alerts, and recurring reviews so teams can explain why the current design exists and remove stale resources safely. This keeps External data source review specific across architecture, security, operations, and incident response.

Reliability

Reliability for the External data source depends on stable storage endpoints, valid credentials, identity role assignments, private network access, correct paths, compatible SQL engine behavior, and diagnostics that connect query failures to storage access. A healthy Azure resource can still fail the business workflow if downstream services, identities, triggers, clients, or data contracts are wrong. Test retries, failover assumptions, disabled states, stale configuration, private DNS problems, timeout behavior, and duplicate processing before relying on the design. Keep runbooks for first-response checks, known limits, owner escalation, and rollback so support teams can recover without guessing. This keeps External data source review specific across architecture, security, operations, and incident response.

Performance

Performance for the External data source depends on storage latency, file layout, partition pruning, external table design, credential path, network route, SQL engine capacity, predicate pushdown where supported, and query scan volume. Measure platform-side metrics and application-side completion metrics because fast service response does not always mean the business task finished. Use realistic data sizes, concurrency, filter patterns, region placement, authentication paths, and downstream limits in tests. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one Azure service. This keeps External data source review specific across architecture, security, operations, and incident response.

Operations

Operations for the External data source require named owners, documented resource IDs, expected behavior, diagnostic settings, and first-response checks. Before a change, capture read-only CLI output, portal screenshots when useful, deployment history, and relevant application configuration. During incidents, avoid changing several settings at once. Compare service metrics, logs, run history, identity evidence, network state, and downstream health in the same time window. Keep release notes clear enough for support teams to verify current behavior quickly. This keeps External data source review specific across architecture, security, operations, and incident response. This keeps External data source review specific across architecture, security, operations, and incident response.

Common mistakes

  • Treating External data source as a label instead of checking the exact resource scope, live configuration, owner, and dependencies.
  • Changing several settings at once without saving read-only evidence, rollback instructions, and the expected metric change.
  • Assuming the Azure resource succeeded means the end-to-end business workflow completed correctly and safely.