Management and Governance Data governance premium

Data catalog

Data catalog is a governed inventory of data assets, metadata, business context, ownership, lineage, and access signals used to help people find and understand data. In plain English, it helps teams discover trusted data products and assets without relying on tribal knowledge or outdated spreadsheets. You see it when Microsoft Purview Unified Catalog organizes data products, data stewards curate glossary terms and ownership, or analytics teams request access to governed assets. The practical question is who owns it, which Azure resource proves it, and what breaks if it is missing.

Aliases
Data catalog
Difficulty
intermediate
CLI mappings
4
Last verified
2026-05-13

Microsoft Learn

Data catalog is a governed inventory of data assets, metadata, business context, ownership, lineage, and access signals used to help people find and understand data. Microsoft Learn places it in Microsoft Purview Unified Catalog; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Microsoft Purview Unified Catalog2026-05-13

Technical context

Technically, Data catalog is backed by Azure configuration, identities, dependencies, logs, and deployment records. Operators validate it by checking the live resource, related permissions, health signals, and approved design notes. Treat it as production configuration: capture resource IDs, test failure behavior, use least-privilege access, and keep rollback notes beside the change record. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible.

Why it matters

Data catalog matters because it gives organizations a practical way to find, understand, and govern data before analysts, engineers, or AI applications use it. In enterprise Azure work, the weak spot is rarely the feature name; it is the gap between design intent and live state. When teams skip this topic, they can create audit findings, production outages, ambiguous ownership, hidden costs, or brittle integrations that show up during an incident. A good glossary entry turns the idea into an operating checklist: confirm scope, dependencies, monitoring, approved owners, and measurable outcomes before the change reaches production. Keep owner, scope, evidence, and rollback visible.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

During portal reviews, Data catalog shows up in configuration, identity, networking, monitoring, or governance pages that prove the current production state. Review owner and rollback notes.

Signal 02

In CLI or IaC reviews, it appears in az purview account, deployment templates, source registration records, and exported governance evidence, helping engineers compare intended configuration with live Azure state before release.

Signal 03

In monitoring and governance, it appears through scan history, data estate health, catalog usage, access-request status, lineage updates, and audit logs, where teams track drift, failures, usage, and policy exceptions tied to owners.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Plan and review production use of Data catalog across subscriptions and environments.
  • Troubleshoot incidents where Data catalog affects access, latency, message flow, governance, or compliance evidence.
  • Create audit-ready runbooks, dashboards, and change checks for Data catalog.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Commercial data marketplace

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Riverbend Bank, a financial services organization, had hundreds of analytics datasets but analysts could not tell which customer, risk, and product tables were approved for reuse.

Business/Technical Objectives
  • Create a searchable catalog for high-value data products
  • Assign accountable data owners to every priority asset
  • Reduce access-request cycle time by thirty percent
  • Show lineage for critical customer reporting datasets
Solution Using Data catalog

Riverbend implemented Microsoft Purview Unified Catalog with governance domains for risk, retail banking, and finance. Data stewards registered priority sources, curated business glossary terms, grouped approved assets into data products, and connected access policies to governed request workflows. Lineage was captured for customer reporting pipelines, and Purview account details were included in Azure governance evidence. Analysts searched the catalog instead of relying on shared spreadsheets or private messages. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch.

Results & Business Impact
  • Priority assets gained documented owners within six weeks
  • Access-request cycle time dropped by thirty two percent
  • Customer reporting lineage became visible to auditors
  • Duplicate analyst datasets were reduced by twenty five percent
Key Takeaway for Glossary Readers

A data catalog turns discovery into a governed workflow rather than an informal scavenger hunt.

Case study 02

Research data stewardship

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

VanArsdel Therapeutics, a pharmaceuticals organization, needed scientists to find approved clinical, lab, and manufacturing datasets without exposing sensitive study metadata unnecessarily.

Business/Technical Objectives
  • Improve discovery of approved research data products
  • Separate sensitive study metadata from broad user access
  • Track lineage from lab files to analysis outputs
  • Raise data health scores for priority domains
Solution Using Data catalog

VanArsdel organized Purview Unified Catalog around research and manufacturing governance domains. Stewards curated data products, attached glossary terms, documented owners, and applied access policies to sensitive study assets. Data Map scans and lineage integrations showed how lab files moved into analytical tables. Security reviewed roles so catalog discovery did not become uncontrolled access, while health actions identified duplicate or poorly described assets that needed cleanup. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch.

Results & Business Impact
  • Scientists found approved datasets forty percent faster
  • Sensitive study access followed documented policy decisions
  • Lineage coverage reached eighty percent for priority pipelines
  • Data health actions reduced duplicate catalog entries
Key Takeaway for Glossary Readers

Catalog quality depends on stewardship, not just scanning more data sources.

Case study 03

Merchandising analytics hub

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Wide World Importers, a wholesale distribution organization, wanted business users to locate trusted inventory and supplier datasets before building Power BI and AI forecasting projects.

Business/Technical Objectives
  • Publish trusted merchandising data products for self-service teams
  • Reduce duplicate reporting datasets by at least twenty percent
  • Connect glossary terms to source assets and dashboards
  • Create a review rhythm for stale catalog entries
Solution Using Data catalog

The data platform team used Purview Unified Catalog to group inventory, supplier, and sales assets into curated data products. Each product included owner names, business glossary terms, access rules, lineage, and quality notes. Catalog stewards reviewed stale assets monthly, while Azure activity evidence tracked Purview account changes. Business users could search by governance domain or data product and request access through a governed workflow. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch. The architecture decision record captured owners, rollback steps, monitoring queries, and acceptance criteria so the operating team could support the design after launch.

Results & Business Impact
  • Duplicate reporting datasets dropped by twenty eight percent
  • Power BI authors found approved sources in one search flow
  • Stale catalog entries were reviewed monthly
  • Forecasting teams reused the same governed inventory product
Key Takeaway for Glossary Readers

A useful data catalog gives business context, ownership, and access guidance in the same place.

Why use Azure CLI for this?

Use Azure CLI for Data catalog to collect repeatable evidence from live Azure resources without changing the JSON engine or relying on one-off portal screenshots.

CLI use cases

  • Confirm the live Azure scope, resource owner, and configuration state for Data catalog before an approval.
  • Capture repeatable evidence for audits, incidents, architecture reviews, and release checklists involving Data catalog.
  • Compare portal settings, IaC intent, and command output so drift is found before it becomes a production issue.

Before you run CLI

  • Confirm the active tenant, subscription, resource group, resource names, and environment before trusting any command output.
  • Start with read-only commands; use mutating commands only when a change ticket, rollback plan, and owner approval exist.
  • Check whether the command touches identity, keys, networking, secrets, billing, or production traffic before running it.

What output tells you

  • It shows whether Data catalog is configured at the expected Azure scope and whether live settings match the approved design.
  • It exposes resource IDs, identities, permissions, component names, encryption settings, logs, or status values needed for troubleshooting.
  • It helps reviewers connect a portal decision to concrete evidence that can be saved in an incident, audit, or release record.

Mapped Azure CLI commands

Data catalog operational checks

direct
az purview account list --resource-group <resource-group>
az purview accountdiscoverManagement and Governance
az purview account show --name <account-name> --resource-group <resource-group>
az purview accountdiscoverManagement and Governance
az purview account create --name <account-name> --resource-group <resource-group> --location <region>
az purview accountprovisionManagement and Governance
az monitor activity-log list --resource-group <resource-group> --max-events 100
az monitor activity-logdiscoverManagement and Governance

Architecture context

Data catalog connects architecture decisions to identity, dependency, monitoring, cost, and operations evidence for production Azure environments.

Security

Security for Data catalog starts with controlling who can discover sensitive assets, request access, curate metadata, approve policies, and connect cataloged data to downstream tools. Apply the right Azure identity, RBAC, networking, secret, policy, and diagnostic controls for the workload. Verification should use live resource state, deployment records, and logs rather than informal screenshots. The main risk is a catalog that exposes too much metadata can reveal sensitive business context, while a poorly governed catalog can send users toward unsafe data copies. Document the failure path if the Purview account, scan credential, access policy, stewardship role, or source-system permission changes, because that is where security controls often become operational incidents.

Cost

Cost for Data catalog comes from Purview capacity, scanning activity, data governance labor, integration work, metadata curation, access workflows, and remediation of poor-quality assets. A configuration that looks free can still increase background usage, security reviews, monitoring volume, or support effort. Review pricing at the whole workflow level, not just the named feature. Good teams tag owners, compare environments, watch utilization, set budgets where possible, and retire unused components before small recurring charges become normalized platform waste. Cost reviews should include the dependency services that make the pattern work in production. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible.

Reliability

Reliability for Data catalog depends on regular scans, source connectivity, metadata freshness, search health, lineage capture, stewardship review cycles, and consistent owner assignment. Test both the happy path and the failure path: stale metadata, failed scans, missing owners, duplicate assets, broken lineage, unreviewed access requests, and disconnected source systems. Production owners should know which metric or log proves the behavior is healthy, what alert fires first, and who can approve an emergency change. The design should include environment parity, rollback notes, recovery expectations, and service-specific limits so support teams are not rebuilding context during an outage. Keep owner, scope, evidence, and rollback visible.

Performance

Performance for Data catalog depends on catalog search responsiveness, scan duration, source size, metadata volume, lineage complexity, user query patterns, and access-request workflow delays. Measure it with production-shaped data and realistic failure modes, not a tiny test request. Check cold starts, retries, payload size, routing, scans, cache behavior, and logging overhead where they apply. Performance work should not weaken security or reliability; the best result is documented tuning that explains which metric improved, which tradeoff was accepted, and when the decision must be reviewed. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible.

Operations

Operations for Data catalog should be repeatable enough that another engineer can verify the same state without guessing. Keep source inventory, scan schedules, data product owners, glossary stewardship, access policy records, lineage reports, and health actions connected to the change record. Review the setting during deployments, access reviews, incident postmortems, cost reviews, and platform upgrades. Avoid one-off portal edits unless they are captured afterward in IaC or documented exception records. The operational goal is clear evidence: what exists, why it exists, how it is monitored, and when it should change. Keep owner, scope, evidence, and rollback visible. Keep owner, scope, evidence, and rollback visible.

Common mistakes

  • Treating Data catalog as a label instead of checking the exact resource, identity, dependency, and monitoring evidence.
  • Assuming development, test, and production are configured the same without comparing live output and deployment templates.
  • Running a mutating command during investigation before confirming blast radius, rollback steps, approval, and ownership.