Get Metadata activity is a pipeline step in Azure Data Factory or Synapse that reads information about data instead of copying or transforming the data itself. Teams use it to check whether files exist, list folders, inspect file size or structure, and decide what later pipeline activities should do. In daily Azure work, it shows up when engineers build metadata-driven ingestion, validate landing-zone files, branch with If Condition, loop over child items, or troubleshoot why a copy activity found no input.
ADF Get Metadata, Synapse Get Metadata, metadata activity
Difficulty
beginner
CLI mappings
3
Last verified
2026-05-14
Microsoft Learn
Get Metadata activity is a pipeline step in Azure Data Factory or Synapse that reads information about data instead of copying or transforming the data itself. Microsoft Learn places it in Get Metadata activity in Azure Data Factory and Azure Synapse; operators confirm scope, configuration, dependencies, and production impact.
Technically, Get Metadata activity is configured or observed through pipeline activities, datasets, linked services, integration runtime, metadata field list, dynamic content, activity output, childItems arrays, and downstream expressions. Important settings include dataset type, path parameters, field list, timeout, retry policy, integration runtime, secure input or output flags, linked service authentication, and pipeline parameters. Operators inspect it with pipeline JSON, activity run output, Data Factory monitor blade, pipeline run logs, integration runtime status, dataset parameter values, and activity-run query results. The useful evidence is current configuration plus logs or metrics that prove the setting behaves as intended.
Why it matters
Get Metadata activity matters because it turns architecture intent into runtime behavior. When teams misunderstand it, they may change the wrong scope, grant access too broadly, overpay for protection, miss recovery requirements, or chase an application bug that is really platform configuration. For this term, that means the activity often decides whether a pipeline proceeds, skips work, loops over files, or fails early, so weak metadata checks can hide missing data or create duplicate processing. It affects security, reliability, operations, cost, and performance because one choice can change how users, identities, traffic, data, deployments, or recovery plans behave under real pressure.
⌁
Where you see it
Signals, screens, and Azure surfaces where this term usually becomes operational.
Signal 01
Pipeline canvas shows a Get Metadata activity before If Condition, ForEach, Copy, or Execute Pipeline activities to validate source data before movement. during review. during review.
Signal 02
Activity output contains fields such as exists, itemName, itemType, size, lastModified, structure, columnCount, or childItems used by downstream expressions. during review. during review. during review.
Signal 03
Monitoring views show metadata activity duration, retries, linked service errors, parameter values, and downstream branch decisions during ingestion incident review. during review. during review. during review.
✦
When this becomes relevant
Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.
Plan, review, or operate Get Metadata activity in a production Azure workload with clear owner and rollback evidence.
Troubleshoot Get Metadata activity by comparing live configuration, logs, metrics, identity, networking, and downstream dependencies.
Standardize Get Metadata activity across environments so security, reliability, cost, and performance decisions are visible to operators.
◆
Real-world case studies
Different enterprise-style examples that show the term being used to hit measurable objectives.
Case study 01
Get Metadata activity in action for retail file validation
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
Tailspin Retail received daily store sales files in ADLS Gen2 and needed to stop pipelines from processing missing or partial files.
🎯Business/Technical Objectives
Verify file existence before copy
Skip late stores without failing the whole batch
Capture evidence for missed uploads
Reduce duplicate support tickets
✅Solution Using Get Metadata activity
The Data Factory team placed a Get Metadata activity before each copy branch. Dataset parameters supplied the store folder and business date, while the activity checked existence, size, and last modified time. If Condition activities used the output to either copy the file, log a missing-file event, or queue a retry window. Monitoring captured activity output so support could distinguish source-system lateness from Data Factory failures. Architects kept the rollout evidence close to the change record: current configuration, expected behavior, approval owner, rollback trigger, and the monitoring signals needed during the first production window. Support engineers received a short operating note that explained what to check first, what not to change during triage, and when to escalate to the platform owner.
📈Results & Business Impact
False pipeline failures dropped by 48 percent
Support tickets for missing files decreased by 37 percent
Late-store reporting became visible by 7:30 a.m.
Duplicate copy attempts were reduced across reruns
💡Key Takeaway for Glossary Readers
Get Metadata activity is valuable when a pipeline needs evidence-driven decisions before moving data.
Case study 02
Get Metadata activity in action for energy sensor ingestion
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
Contoso Energy loaded sensor folders from field gateways and needed one reusable pipeline for thousands of device paths.
🎯Business/Technical Objectives
Enumerate incoming device folders
Process only files with expected structure
Avoid hardcoded path lists
Improve failed-run triage
✅Solution Using Get Metadata activity
Engineers used Get Metadata to retrieve childItems from the landing directory and passed that list into a ForEach activity. A second metadata check inspected file structure before Copy activity loaded data into the curated zone. The integration runtime used managed identity to read only the approved storage paths. Activity-run output recorded which device folders were present, skipped, or rejected because the file shape did not match expectations. The implementation avoided broad changes by separating read-only discovery, lower-environment validation, production approval, and post-change monitoring into separate runbook steps. Security, reliability, cost, and performance reviewers used the same evidence package, so no team had to infer risk from an isolated deployment result. The rollback plan named the previous setting, expected recovery time, responsible owner, and the logs that would prove the service had returned to normal behavior.
📈Results & Business Impact
One pipeline replaced 42 device-specific variants
Bad file structures were rejected before copy
Triage time fell from hours to minutes
Storage permissions remained limited to ingestion paths
💡Key Takeaway for Glossary Readers
Get Metadata supports metadata-driven ingestion when teams use its output as controlled pipeline input, not as informal troubleshooting text.
Case study 03
Get Metadata activity in action for healthcare referral intake
Scenario, objectives, solution, measured impact, and takeaway.
📌Scenario
A regional clinic network used Data Factory to ingest referral documents and wanted a safer way to process new upload folders.
🎯Business/Technical Objectives
Confirm expected folders before processing
Avoid exposing document contents in logs
Route empty folders to exception handling
Provide evidence for daily operations
✅Solution Using Get Metadata activity
The pipeline used Get Metadata to check folder existence and enumerate child items without reading document content into logs. Secure output was enabled where sensitive paths could appear. If Condition and ForEach activities routed empty folders to an exception table and valid files to copy processing. Operators queried activity runs each morning and reviewed missing-folder evidence with source application owners. Operations staff added dashboard links, saved CLI output, dependency notes, and ownership tags so the next incident review would start with facts instead of assumptions. The design was promoted gradually, with success criteria tied to customer-visible behavior, platform metrics, and service-health checks from the same time window. After release, the team retired stale exceptions and updated training notes so future projects used the same pattern without copying old risky configuration.
📈Results & Business Impact
Empty-folder incidents were identified before copy
Sensitive document contents stayed out of pipeline logs
Morning review time decreased by 52 percent
Referral processing met daily SLA for four quarters
💡Key Takeaway for Glossary Readers
Get Metadata activity helps data teams inspect readiness signals while keeping actual data movement and validation in separate controlled steps.
Why use Azure CLI for this?
CLI checks make Get Metadata activity review repeatable because they capture scoped evidence for the current target before anyone changes production. Use read-only commands first to confirm subscription, resource group, identity, region, and dependency state. Mutating commands should run only after approval, rollback, cost impact, and customer impact are understood.
CLI use cases
Review pipeline JSON to verify selected metadata fields, dataset parameters, dependencies, and conditional branches before release.
Pull activity-run output during an ingestion incident to see whether Get Metadata found the expected file or folder.
Compare debug and scheduled runs to determine whether path parameters, integration runtime, or source permissions changed.
Before you run CLI
Confirm tenant, subscription, resource group, application, account, database, or factory scope before trusting command output.
Run list and show commands first, then save evidence before create, update, failover, deploy, delete, or permission changes.
Check whether the command affects customer traffic, credentials, data access, regional recovery, billing, compliance evidence, or production routing.
What output tells you
Names, resource IDs, locations, SKUs, enabled states, and parent relationships show whether you are inspecting the intended target.
Settings, identities, regions, roles, endpoints, parameters, or deployment properties explain how the workload behaves today.
Timestamps, metrics, health state, run logs, and deployment history help separate Azure configuration issues from application failures.
Mapped Azure CLI commands
Get Metadata activity operational checks
direct
az datafactory pipeline show --factory-name <factory-name> --resource-group <resource-group> --name <pipeline-name>
az datafactory pipelinediscoverAnalytics
az datafactory pipeline-run show --factory-name <factory-name> --resource-group <resource-group> --run-id <run-id>
Technically, Get Metadata activity is configured or observed through pipeline activities, datasets, linked services, integration runtime, metadata field list, dynamic content, activity output, childItems arrays, and downstream expressions. Important settings include dataset type, path parameters, field list, timeout, retry policy, integration runtime, secure input or output flags, linked service authentication, and pipeline parameters. Operators inspect it with pipeline JSON, activity run output, Data Factory monitor blade, pipeline run logs, integration runtime status, dataset parameter values, and activity-run query results. The useful evidence is current configuration plus logs or metrics that prove the setting behaves as intended.
Security
Security for Get Metadata activity starts with linked service credentials, managed identity permissions, storage ACLs, secure input and output settings, parameter handling, diagnostic log access, and least-privilege access to metadata sources. Review who can create, update, delete, execute, read logs, approve dependencies, and manage credentials or identities. Prefer Microsoft Entra ID, managed identity, private networking, least privilege, customer-managed keys, and audited automation where the service supports them. Keep secrets out of code and avoid broad public exposure unless there is a documented exception. Capture role assignments, diagnostic settings, policy decisions, Activity Log entries, and owner approvals so access and data handling are intentional and reviewable.
Cost
Cost for Get Metadata activity is driven by pipeline activity runs, integration runtime usage, repeated folder enumeration, excessive retries, unnecessary debug runs, monitoring volume, and downstream work triggered by incorrect metadata decisions. The expensive mistake is not only Azure consumption; it can also be duplicate experiments, emergency support, overprovisioned capacity, unnecessary data transfer, or cleanup after weak design evidence. Review whether the workload truly needs the selected tier, retention, diagnostics, network path, scale rule, replication model, storage redundancy, or automation pattern. Use tags, budgets, alerts, and cleanup reviews so teams can explain why the design exists and remove stale resources safely.
Reliability
Reliability for Get Metadata activity depends on source availability, dataset parameters, folder paths, metadata field support, retry policy, integration runtime health, activity dependencies, conditional branches, and behavior when files arrive late. A resource can be present and still fail the business workflow if routing, identity, quota, storage, code, failover order, scale, or downstream health is wrong. Test failure modes, retries, deployment behavior, disabled states, rollback steps, and maintenance windows before relying on the design. During incidents, compare platform metrics, logs, deployment history, and application traces from the same time window before changing production. The goal is a recoverable configuration support teams can verify quickly.
Performance
Performance for Get Metadata activity depends on folder size, number of child items, metadata field selection, connector behavior, integration runtime location, storage latency, retry settings, and downstream loops over large file lists. Measure platform metrics and application-side completion times because a fast control-plane response does not prove users received the right result. Test with realistic regions, data sizes, concurrency, authentication paths, route choices, cache state, and downstream limits. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one service. Tune with evidence from the exact environment and traffic pattern.
Operations
Operations for Get Metadata activity require pipeline monitoring, activity output capture, reusable metadata patterns, naming standards, parameter audits, failed-run triage, integration runtime checks, and runbooks for missing or late files. Before a change, capture read-only CLI output, portal evidence when useful, owner tags, expected behavior, and rollback steps. During incidents, avoid changing several settings at once; compare metrics, logs, deployment operations, identity evidence, network state, and downstream health first. Keep runbooks clear enough for support teams to verify current behavior quickly. Good operations make the term observable, reviewable, and recoverable during releases, audits, and incidents. Review owner, scope, evidence, dependencies, and rollback before production change.
Common mistakes
Treating Get Metadata activity as a simple label instead of checking live scope, owner, dependencies, and current configuration.
Running a mutating command for Get Metadata activity in the wrong subscription, resource group, tenant, region, or application context.
Assuming successful deployment proves Get Metadata activity works without checking logs, metrics, user behavior, recovery evidence, and rollback steps.