Analytics Data integration premium

Change data capture in Data Factory

Change data capture in Data Factory is a Data Factory capability for moving only changed source data instead of reprocessing a full dataset on every integration run. In everyday Azure work, it helps teams lower ETL cost, reduce latency, preserve freshness, and keep lake or warehouse targets synchronized with operational systems that produce inserts, updates, or deletes. You usually see it around Data Factory authoring, Change Data Capture resources, mapping data flows, linked services, pipeline monitors, triggers, Delta Lake targets, and SQL source systems.

Aliases
CDC
Difficulty
intermediate
CLI mappings
10
Last verified
2026-05-12

Microsoft Learn

A Data Factory capability for processing changed data from supported sources. Microsoft Learn places it in Change data capture in Azure Data Factory and Azure Synapse Analytics; operators confirm scope, configuration, dependencies, and production impact. Use the linked source for exact Azure behavior.

Microsoft Learn: Change data capture in Azure Data Factory and Azure Synapse Analytics2026-05-12

Technical context

Technically, Change data capture in Data Factory appears as a CDC factory resource, pipeline pattern, or data flow configuration that tracks deltas from supported sources and writes them to configured targets. Engineers verify it through source and sink definitions, checkpoint or subscriber state, run history, trigger schedule, schema evolution settings, data flow logs, and target row counts. Important configuration includes source connector, destination connector, keys, transformations, schema drift behavior, runtime, schedule, monitoring, restart strategy, and private networking. It connects to monitoring, identity, networking, automation, and cost records for consistent review.

Why it matters

Change data capture in Data Factory matters because a weak CDC design can duplicate rows, miss deletes, overload sources, break schema evolution, or make downstream analytics stale while dashboards appear healthy. The business impact is rarely abstract: users see slower systems, missing data, failed sign-ins, confusing reports, or unexpected cost when the setting is misunderstood. A solid glossary entry gives architects and operators the same language for design reviews, support handoffs, and audit evidence. It also helps teams decide what to check first, which metric or log proves the current state, who owns remediation, and when a change should be rolled back instead of patched live.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal, Change data capture in Data Factory appears near Data Factory CDC resources, where operators confirm scope, owner, diagnostics, access, and production state before release or incident response.

Signal 02

In CLI, ARM, SDK, REST, or Bicep output, Change data capture in Data Factory appears as source and sink mappings, giving teams repeatable release and audit evidence during deployments and incidents.

Signal 03

In logs, metrics, tickets, or reviews, Change data capture in Data Factory appears beside watermarks, changed rows, latency, and pipeline health, linking symptoms to security, reliability, cost, and performance decisions quickly.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Plan how data moves from source systems into curated reporting or AI datasets.
  • Troubleshoot failed pipeline runs, permissions, integration runtimes, or data movement bottlenecks.
  • Separate batch, streaming, lake, warehouse, and notebook responsibilities.
  • Document data ownership, lineage, and operational recovery expectations.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Near real-time sales lake

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Trailhead Outfitters, a retail chain, needed near real-time sales deltas from Azure SQL into a Delta Lake reporting layer.

Business/Technical Objectives
  • Refresh sales analytics within 15 minutes
  • Avoid full table reloads during store hours
  • Handle schema additions safely
  • Cut integration compute cost by 30 percent
Solution Using Change data capture in Data Factory

Data engineers configured a Data Factory change data capture resource to read changed sales rows, apply lightweight transformations, and write them to ADLS Gen2 Delta Lake. They used managed identity for source and sink access, private endpoints for database connectivity, and monitoring alerts for CDC lag and failed schema evolution. A baseline run loaded current sales data, then incremental runs processed only changed records. The team documented keys, checkpoint behavior, target merge logic, and recovery steps. Power BI dashboards consumed curated tables rather than querying the operational database directly. The runbook also captured dashboard links, owner contacts, rollback criteria, and monthly review steps so operators could verify the design without tribal knowledge during incidents or release windows. Evidence from each change was saved with the deployment record for audit, support, and future capacity planning.

Results & Business Impact
  • Sales dashboards refreshed in under 11 minutes
  • Full reload jobs were eliminated during business hours
  • Integration compute cost dropped by 36 percent
  • Schema additions were tested without breaking the target lake
Key Takeaway for Glossary Readers

CDC in Data Factory is useful when delta tracking, identity, schema handling, and target freshness are designed together.

Case study 02

SAP procurement deltas

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Woodgrove Manufacturing, a global manufacturer, needed SAP procurement changes replicated to Azure for supplier-risk analytics.

Business/Technical Objectives
  • Capture SAP deltas without nightly full extracts
  • Reduce source system load during month end
  • Support lakehouse reporting by region
  • Document recovery for failed CDC windows
Solution Using Change data capture in Data Factory

The integration team used Azure Data Factory SAP CDC capabilities with a self-hosted integration runtime close to the SAP landscape. Mapping data flows extracted procurement deltas, parameterized source objects, and wrote results to ADLS Gen2 with region and business-date partitioning. The pipeline used tumbling window triggers, unique checkpoint keys, and Azure Monitor alerts for lag, connector failures, and runtime health. Security reviewed SAP credentials, private connectivity, and sink permissions. The runbook described how to replay a failed window without duplicating purchase order changes. The rollout plan included before-and-after measurements, escalation contacts, exception rules, and a control review with finance, security, and application owners. That evidence made the improvement repeatable and gave support teams a clear baseline when later incidents raised similar symptoms.

Results & Business Impact
  • Nightly full extracts were retired for selected objects
  • SAP extraction load fell by 58 percent at month end
  • Supplier-risk dashboards refreshed four times per day
  • Failed-window replay completed successfully in testing
Key Takeaway for Glossary Readers

Data Factory CDC helps SAP analytics when checkpointing, runtime placement, and replay rules are explicit.

Case study 03

Healthcare schema evolution

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

BlueRiver Clinics, a healthcare provider, needed to capture patient appointment changes while allowing new scheduling fields to appear over time.

Business/Technical Objectives
  • Capture inserts and updates within 20 minutes
  • Support approved source schema changes
  • Protect patient data during movement
  • Provide operations evidence for failed loads
Solution Using Change data capture in Data Factory

Engineers created a Data Factory CDC resource that read changed appointment records from Azure SQL and wrote them to an encrypted ADLS Gen2 Delta target. Schema evolution was enabled only for approved fields, and a data quality step compared expected keys, row counts, and nullable changes before publishing. Managed identities, private endpoints, and restricted storage ACLs protected patient data. The monitor view and Log Analytics queries tracked lag, runtime errors, and target write failures. A runbook described how to pause capture, correct mappings, and resume without losing checkpoint state. Release evidence included configuration snapshots, CLI output, representative metrics, and ticket notes, giving operations a durable baseline for training, audits, and later optimization work. The team also defined when to scale back, rotate credentials, or open a vendor support case.

Results & Business Impact
  • Appointment changes reached analytics in 14 minutes on average
  • Approved schema changes deployed without target table rebuilds
  • No patient data moved through public endpoints
  • Operations reduced failed-load triage from hours to 25 minutes
Key Takeaway for Glossary Readers

CDC in Data Factory can keep clinical analytics fresh only when schema governance and security controls match the data sensitivity.

Why use Azure CLI for this?

Use CLI, REST, SDK, or scripted queries for Change data capture in Data Factory because CDC evidence must tie pipeline state, source settings, runtime identity, and target freshness together during troubleshooting and to avoid portal-only evidence during reviews.

CLI use cases

  • List pipeline and data flow configuration before enabling a CDC production run.
  • Capture run history and target counts after a failed incremental load.
  • Verify linked service identity and network access before changing CDC sources.

Before you run CLI

  • Confirm the active tenant, subscription, resource group, workspace, account, or region before running commands.
  • Use least-privileged access and avoid storing secrets, prompts, certificates, tokens, or personal data in command output.
  • Know whether the command is read-only, mutating, cost-impacting, security-impacting, or destructive before production use.

What output tells you

  • Output confirms whether the live Azure configuration exists at the expected scope and matches the approved design.
  • Returned IDs, settings, metrics, timestamps, or logs help separate configuration drift from application behavior.
  • Differences between expected and actual state create evidence for rollback, escalation, audit, or owner follow-up.

Mapped Azure CLI commands

Datafactory operations

direct
az datafactory list --resource-group <resource-group>
az datafactorydiscoverAnalytics
az datafactory show --factory-name <factory-name> --resource-group <resource-group>
az datafactorydiscoverAnalytics
az datafactory pipeline list --factory-name <factory-name> --resource-group <resource-group>
az datafactory pipelinediscoverAnalytics
az datafactory pipeline-run query-by-factory --factory-name <factory-name> --resource-group <resource-group> --last-updated-after <start> --last-updated-before <end>
az datafactory pipeline-rundiscoverAnalytics
az datafactory show --name <factory> --resource-group <resource-group>
az datafactorydiscoverAnalytics
az datafactory create --name <factory> --resource-group <resource-group> --location <region>
az datafactoryprovisionAnalytics
az datafactory pipeline list --factory-name <factory> --resource-group <resource-group>
az datafactory pipelinediscoverAnalytics
az datafactory pipeline-run query-by-factory --factory-name <factory> --resource-group <resource-group> --last-updated-after <utc> --last-updated-before <utc>
az datafactory pipeline-rundiscoverAnalytics
az datafactory trigger list --factory-name <factory> --resource-group <resource-group>
az datafactory triggerdiscoverAnalytics
az datafactory trigger start --factory-name <factory> --resource-group <resource-group> --name <trigger>
az datafactory triggeroperateAnalytics

Architecture context

Change data capture in Data Factory is an integration architecture pattern for moving deltas instead of repeatedly copying full datasets. Architects use it when operational sources must feed a lake, warehouse, or reporting model with lower latency, lower source pressure, and clearer restart behavior. The design must account for source connector capabilities, keys, delete handling, schema drift, checkpoint state, trigger cadence, integration runtime placement, managed identity, private endpoints, and target write semantics. In Azure DevOps, CDC configuration should be promoted as part of the factory artifact set and validated with run history and row-count reconciliation. The risk is subtle: a pipeline can look healthy while silently duplicating rows, missing deletes, or lagging behind business events.

Security

Security for Change data capture in Data Factory starts with understanding which identities, secrets, certificates, endpoints, data stores, or management-plane permissions it touches. Review who can view, change, or use it, and confirm that production access follows least privilege. Check whether private networking, firewall rules, RBAC, key vault storage, managed identity, audit logs, and data classification apply. Operators should avoid exposing tokens, connection strings, prompts, certificate material, or cost-sensitive business metadata in troubleshooting output. A secure design also documents emergency access, rotation responsibilities, and evidence retention so an incident response team can prove the current configuration without inventing access during an outage.

Cost

Cost for Change data capture in Data Factory comes from the resources, transactions, data movement, retention, compute, capacity, tokens, or operational labor it influences. Some costs are direct meters, while others appear as extra storage, higher throughput, duplicate processing, export jobs, monitoring ingestion, or engineering time. Review budgets, cost allocation, tags, usage metrics, and SKU limits before scaling or enabling new behavior. The safest approach is to define the owner, expected usage pattern, and alert thresholds up front. That way finance conversations use evidence instead of opinions after the bill arrives. Finance and engineering teams should agree which metric proves usage and which scope owns remediation.

Reliability

Reliability for Change data capture in Data Factory depends on whether the design behaves predictably during scale events, regional incidents, expired credentials, throttling, schema changes, or downstream failures. Identify the dependency chain, expected failure mode, and recovery target before production use. Monitor the signals that show backlog, lag, retries, health state, capacity saturation, authentication failures, or stale data. Test restore, rotation, failover, replay, or rollback paths where they apply. Operators need a runbook that separates platform configuration problems from application defects and says which evidence is required before escalating to networking, identity, database, or product teams. Runbooks should state the first observable symptom, safe rollback path, and owner escalation route.

Performance

Performance for Change data capture in Data Factory is about how quickly and consistently the related workload can complete useful work. Measure the right signals: latency, throughput, backlog, request units, token volume, CPU, memory, bytes scanned, file counts, retries, or throttled operations depending on the service. Avoid tuning one setting in isolation when partitions, replicas, keys, network paths, identity calls, downstream services, or client behavior may be the real bottleneck. Performance reviews should compare expected workload shape with live metrics and include a safe test plan before increasing capacity or changing production configuration. Load tests should compare expected throughput, latency, queue depth, and saturation signals against live limits.

Operations

Operationally, Change data capture in Data Factory needs ownership, naming, tagging, change records, and repeatable verification. Teams should know where it appears in the portal, which commands or queries prove state, which dashboards show health, and which settings are safe to change during business hours. Keep examples, approvals, and rollback notes with the service runbook rather than in personal notes. For production changes, capture current configuration before and after the work, including resource IDs, region, owner, timestamp, and related deployment. Good operations turn the term into a checklist that first responders can follow under pressure. Operational evidence should include timestamps, resource IDs, owner names, and links to the approved change record.

Common mistakes

  • Treating CDC as automatic data correctness without validating keys, deletes, and schema evolution.
  • Using debug runs as proof that the published production pipeline state is correct.
  • Ignoring source system capacity when increasing CDC frequency for fresher analytics.