A data flow source is the read boundary of a mapping data flow. It anchors the transformation plan to a linked service, dataset or inline source, schema projection, parameters, and integration runtime. In design reviews, I look for whether the source can push down filters, whether partitioning is configured for large reads, and whether schema drift is intentional or just masking bad upstream change control. The source also defines the first security boundary because credentials, managed identities, firewalls, and private endpoints are exercised before any transformation runs. When pipelines miss their SLA, source configuration is often where latency, throttling, file enumeration, or unexpected row counts first appear.
SecuritySecurity for Data flow source starts with identifying who can edit it, who can read runtime evidence, and which identities, secrets, network paths, or data stores it touches. Review source credentials, managed identity permissions, private endpoints, firewall rules, least-privilege read access, masking needs, and who can preview sensitive source rows. Use managed identities where possible, restrict authoring access, protect linked-service credentials, and keep private or approved network paths for regulated data. Log changes and run outcomes in Azure Monitor so reviewers can prove what happened. During incidents, check whether RBAC, firewall, private endpoint, dataset, or source-control changes occurred before assuming the data flow itself is broken.
CostCost for Data flow source comes from source scans, file enumeration, repeated previews, large wildcard paths, retries, unnecessary nonproduction reads, monitoring logs, and extra compute time from poor filtering. Watch repeated debug sessions, oversized compute, trigger frequency, retry loops, log retention, storage transactions, and nonproduction copies. Small settings can become expensive when multiplied across environments, regions, schedules, or large files. Use tags, budgets, and run history to separate useful usage from noise. Before expanding scope, estimate data volume, active runtime duration, monitoring retention, and support effort. After deployment, compare expected cost with actual metrics and remove unused paths or long-running sessions. Review cleanup tasks and expected usage before wider rollout.
ReliabilityReliability for Data flow source means the workload keeps producing trustworthy data when schemas drift, source systems throttle, clusters start slowly, or downstream services reject writes. Plan around input availability, schema projection, late-arriving files, retry policy, connector limits, source system throttling, and repeatable behavior when rerunning after failed reads. Keep retries, timeouts, idempotent reruns, and dependency owners visible in the runbook. Monitor user-visible freshness as well as Azure run status, because a technically successful run can still deliver partial or stale data. Test permission loss, missing files, regional service issues, and rollback steps before relying on it for business reporting. Document tested rollback ownership.
PerformancePerformance for Data flow source depends on how quickly trustworthy data moves through the related path without overloading sources, compute, networks, or destinations. Pay attention to source partitioning, predicate pushdown, file format, folder pruning, connector throughput, schema inference, sample size, and source region relative to compute and sink. Measure throughput, duration, queue time, rows processed, skew, throttling, and downstream freshness, not just whether the resource exists. Tune gradually because partitioning, source filters, sink batch behavior, compute size, and concurrency can improve one stage while hurting another. Compare debug behavior with triggered runs, then retest after schema, network, cluster, or dataset changes. Record the baseline before approving scale changes.
OperationsOperations for Data flow source should be simple enough for a second engineer to reproduce without tribal knowledge. The runbook should cover source ownership, expected freshness, file naming, schema-change alerts, connection-test evidence, parameter values, runbook escalation, and comparison between preview and scheduled reads. Keep naming, tags, dashboards, tickets, and source-controlled definitions aligned across dev, test, and production. Use read-only CLI checks for routine evidence, then require an approved change ticket for mutating runs or configuration changes. After rollout, compare actual run history, logs, cost, and data-quality signals with the expected result, and record the owner follow-up before closing the change.