Analytics Data Factory premium

Get Metadata activity

Get Metadata activity is a pipeline step in Azure Data Factory or Synapse that reads information about data instead of copying or transforming the data itself. Teams use it to check whether files exist, list folders, inspect file size or structure, and decide what later pipeline activities should do. In daily Azure work, it shows up when engineers build metadata-driven ingestion, validate landing-zone files, branch with If Condition, loop over child items, or troubleshoot why a copy activity found no input.

Aliases
ADF Get Metadata, Synapse Get Metadata, metadata activity
Difficulty
beginner
CLI mappings
3
Last verified
2026-05-14

Microsoft Learn

Get Metadata activity is a pipeline step in Azure Data Factory or Synapse that reads information about data instead of copying or transforming the data itself. Microsoft Learn places it in Get Metadata activity in Azure Data Factory and Azure Synapse; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Get Metadata activity in Azure Data Factory and Azure Synapse2026-05-14

Technical context

Technically, Get Metadata activity is configured or observed through pipeline activities, datasets, linked services, integration runtime, metadata field list, dynamic content, activity output, childItems arrays, and downstream expressions. Important settings include dataset type, path parameters, field list, timeout, retry policy, integration runtime, secure input or output flags, linked service authentication, and pipeline parameters. Operators inspect it with pipeline JSON, activity run output, Data Factory monitor blade, pipeline run logs, integration runtime status, dataset parameter values, and activity-run query results. The useful evidence is current configuration plus logs or metrics that prove the setting behaves as intended.

Why it matters

Get Metadata activity matters because it turns architecture intent into runtime behavior. When teams misunderstand it, they may change the wrong scope, grant access too broadly, overpay for protection, miss recovery requirements, or chase an application bug that is really platform configuration. For this term, that means the activity often decides whether a pipeline proceeds, skips work, loops over files, or fails early, so weak metadata checks can hide missing data or create duplicate processing. It affects security, reliability, operations, cost, and performance because one choice can change how users, identities, traffic, data, deployments, or recovery plans behave under real pressure.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

Pipeline canvas shows a Get Metadata activity before If Condition, ForEach, Copy, or Execute Pipeline activities to validate source data before movement. during review. during review.

Signal 02

Activity output contains fields such as exists, itemName, itemType, size, lastModified, structure, columnCount, or childItems used by downstream expressions. during review. during review. during review.

Signal 03

Monitoring views show metadata activity duration, retries, linked service errors, parameter values, and downstream branch decisions during ingestion incident review. during review. during review. during review.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Plan, review, or operate Get Metadata activity in a production Azure workload with clear owner and rollback evidence.
  • Troubleshoot Get Metadata activity by comparing live configuration, logs, metrics, identity, networking, and downstream dependencies.
  • Standardize Get Metadata activity across environments so security, reliability, cost, and performance decisions are visible to operators.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Get Metadata activity in action for retail file validation

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Tailspin Retail received daily store sales files in ADLS Gen2 and needed to stop pipelines from processing missing or partial files.

Business/Technical Objectives
  • Verify file existence before copy
  • Skip late stores without failing the whole batch
  • Capture evidence for missed uploads
  • Reduce duplicate support tickets
Solution Using Get Metadata activity

The Data Factory team placed a Get Metadata activity before each copy branch. Dataset parameters supplied the store folder and business date, while the activity checked existence, size, and last modified time. If Condition activities used the output to either copy the file, log a missing-file event, or queue a retry window. Monitoring captured activity output so support could distinguish source-system lateness from Data Factory failures. Architects kept the rollout evidence close to the change record: current configuration, expected behavior, approval owner, rollback trigger, and the monitoring signals needed during the first production window. Support engineers received a short operating note that explained what to check first, what not to change during triage, and when to escalate to the platform owner.

Results & Business Impact
  • False pipeline failures dropped by 48 percent
  • Support tickets for missing files decreased by 37 percent
  • Late-store reporting became visible by 7:30 a.m.
  • Duplicate copy attempts were reduced across reruns
Key Takeaway for Glossary Readers

Get Metadata activity is valuable when a pipeline needs evidence-driven decisions before moving data.

Case study 02

Get Metadata activity in action for energy sensor ingestion

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Contoso Energy loaded sensor folders from field gateways and needed one reusable pipeline for thousands of device paths.

Business/Technical Objectives
  • Enumerate incoming device folders
  • Process only files with expected structure
  • Avoid hardcoded path lists
  • Improve failed-run triage
Solution Using Get Metadata activity

Engineers used Get Metadata to retrieve childItems from the landing directory and passed that list into a ForEach activity. A second metadata check inspected file structure before Copy activity loaded data into the curated zone. The integration runtime used managed identity to read only the approved storage paths. Activity-run output recorded which device folders were present, skipped, or rejected because the file shape did not match expectations. The implementation avoided broad changes by separating read-only discovery, lower-environment validation, production approval, and post-change monitoring into separate runbook steps. Security, reliability, cost, and performance reviewers used the same evidence package, so no team had to infer risk from an isolated deployment result. The rollback plan named the previous setting, expected recovery time, responsible owner, and the logs that would prove the service had returned to normal behavior.

Results & Business Impact
  • One pipeline replaced 42 device-specific variants
  • Bad file structures were rejected before copy
  • Triage time fell from hours to minutes
  • Storage permissions remained limited to ingestion paths
Key Takeaway for Glossary Readers

Get Metadata supports metadata-driven ingestion when teams use its output as controlled pipeline input, not as informal troubleshooting text.

Case study 03

Get Metadata activity in action for healthcare referral intake

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A regional clinic network used Data Factory to ingest referral documents and wanted a safer way to process new upload folders.

Business/Technical Objectives
  • Confirm expected folders before processing
  • Avoid exposing document contents in logs
  • Route empty folders to exception handling
  • Provide evidence for daily operations
Solution Using Get Metadata activity

The pipeline used Get Metadata to check folder existence and enumerate child items without reading document content into logs. Secure output was enabled where sensitive paths could appear. If Condition and ForEach activities routed empty folders to an exception table and valid files to copy processing. Operators queried activity runs each morning and reviewed missing-folder evidence with source application owners. Operations staff added dashboard links, saved CLI output, dependency notes, and ownership tags so the next incident review would start with facts instead of assumptions. The design was promoted gradually, with success criteria tied to customer-visible behavior, platform metrics, and service-health checks from the same time window. After release, the team retired stale exceptions and updated training notes so future projects used the same pattern without copying old risky configuration.

Results & Business Impact
  • Empty-folder incidents were identified before copy
  • Sensitive document contents stayed out of pipeline logs
  • Morning review time decreased by 52 percent
  • Referral processing met daily SLA for four quarters
Key Takeaway for Glossary Readers

Get Metadata activity helps data teams inspect readiness signals while keeping actual data movement and validation in separate controlled steps.

Why use Azure CLI for this?

CLI checks make Get Metadata activity review repeatable because they capture scoped evidence for the current target before anyone changes production. Use read-only commands first to confirm subscription, resource group, identity, region, and dependency state. Mutating commands should run only after approval, rollback, cost impact, and customer impact are understood.

CLI use cases

  • Review pipeline JSON to verify selected metadata fields, dataset parameters, dependencies, and conditional branches before release.
  • Pull activity-run output during an ingestion incident to see whether Get Metadata found the expected file or folder.
  • Compare debug and scheduled runs to determine whether path parameters, integration runtime, or source permissions changed.

Before you run CLI

  • Confirm tenant, subscription, resource group, application, account, database, or factory scope before trusting command output.
  • Run list and show commands first, then save evidence before create, update, failover, deploy, delete, or permission changes.
  • Check whether the command affects customer traffic, credentials, data access, regional recovery, billing, compliance evidence, or production routing.

What output tells you

  • Names, resource IDs, locations, SKUs, enabled states, and parent relationships show whether you are inspecting the intended target.
  • Settings, identities, regions, roles, endpoints, parameters, or deployment properties explain how the workload behaves today.
  • Timestamps, metrics, health state, run logs, and deployment history help separate Azure configuration issues from application failures.

Mapped Azure CLI commands

Get Metadata activity operational checks

direct
az datafactory pipeline show --factory-name <factory-name> --resource-group <resource-group> --name <pipeline-name>
az datafactory pipelinediscoverAnalytics
az datafactory pipeline-run show --factory-name <factory-name> --resource-group <resource-group> --run-id <run-id>
az datafactory pipeline-rundiscoverAnalytics
az datafactory activity-run query-by-pipeline-run --factory-name <factory-name> --resource-group <resource-group> --run-id <run-id> --last-updated-after <utc-start> --last-updated-before <utc-end>
az datafactory activity-rundiscoverAnalytics

Architecture context

Technically, Get Metadata activity is configured or observed through pipeline activities, datasets, linked services, integration runtime, metadata field list, dynamic content, activity output, childItems arrays, and downstream expressions. Important settings include dataset type, path parameters, field list, timeout, retry policy, integration runtime, secure input or output flags, linked service authentication, and pipeline parameters. Operators inspect it with pipeline JSON, activity run output, Data Factory monitor blade, pipeline run logs, integration runtime status, dataset parameter values, and activity-run query results. The useful evidence is current configuration plus logs or metrics that prove the setting behaves as intended.

Security

Security for Get Metadata activity starts with linked service credentials, managed identity permissions, storage ACLs, secure input and output settings, parameter handling, diagnostic log access, and least-privilege access to metadata sources. Review who can create, update, delete, execute, read logs, approve dependencies, and manage credentials or identities. Prefer Microsoft Entra ID, managed identity, private networking, least privilege, customer-managed keys, and audited automation where the service supports them. Keep secrets out of code and avoid broad public exposure unless there is a documented exception. Capture role assignments, diagnostic settings, policy decisions, Activity Log entries, and owner approvals so access and data handling are intentional and reviewable.

Cost

Cost for Get Metadata activity is driven by pipeline activity runs, integration runtime usage, repeated folder enumeration, excessive retries, unnecessary debug runs, monitoring volume, and downstream work triggered by incorrect metadata decisions. The expensive mistake is not only Azure consumption; it can also be duplicate experiments, emergency support, overprovisioned capacity, unnecessary data transfer, or cleanup after weak design evidence. Review whether the workload truly needs the selected tier, retention, diagnostics, network path, scale rule, replication model, storage redundancy, or automation pattern. Use tags, budgets, alerts, and cleanup reviews so teams can explain why the design exists and remove stale resources safely.

Reliability

Reliability for Get Metadata activity depends on source availability, dataset parameters, folder paths, metadata field support, retry policy, integration runtime health, activity dependencies, conditional branches, and behavior when files arrive late. A resource can be present and still fail the business workflow if routing, identity, quota, storage, code, failover order, scale, or downstream health is wrong. Test failure modes, retries, deployment behavior, disabled states, rollback steps, and maintenance windows before relying on the design. During incidents, compare platform metrics, logs, deployment history, and application traces from the same time window before changing production. The goal is a recoverable configuration support teams can verify quickly.

Performance

Performance for Get Metadata activity depends on folder size, number of child items, metadata field selection, connector behavior, integration runtime location, storage latency, retry settings, and downstream loops over large file lists. Measure platform metrics and application-side completion times because a fast control-plane response does not prove users received the right result. Test with realistic regions, data sizes, concurrency, authentication paths, route choices, cache state, and downstream limits. When performance regresses, compare configuration changes, resource limits, client logs, diagnostic data, and workload timing before adding capacity or blaming one service. Tune with evidence from the exact environment and traffic pattern.

Operations

Operations for Get Metadata activity require pipeline monitoring, activity output capture, reusable metadata patterns, naming standards, parameter audits, failed-run triage, integration runtime checks, and runbooks for missing or late files. Before a change, capture read-only CLI output, portal evidence when useful, owner tags, expected behavior, and rollback steps. During incidents, avoid changing several settings at once; compare metrics, logs, deployment operations, identity evidence, network state, and downstream health first. Keep runbooks clear enough for support teams to verify current behavior quickly. Good operations make the term observable, reviewable, and recoverable during releases, audits, and incidents. Review owner, scope, evidence, dependencies, and rollback before production change.

Common mistakes

  • Treating Get Metadata activity as a simple label instead of checking live scope, owner, dependencies, and current configuration.
  • Running a mutating command for Get Metadata activity in the wrong subscription, resource group, tenant, region, or application context.
  • Assuming successful deployment proves Get Metadata activity works without checking logs, metrics, user behavior, recovery evidence, and rollback steps.