Analytics Data Factory field-manual-complete template-specs-five-use-cases field-manual-complete

Self-hosted integration runtime

A self-hosted integration runtime is an agent you install on a server or virtual machine that can reach private data sources. Azure Data Factory, Synapse, or Purview sends work to that runtime so it can copy data, dispatch activities, or scan systems that are not publicly reachable. It is useful for on-premises SQL Server, private network file shares, legacy databases, and locked-down virtual networks. You manage the host, updates, connectivity, and capacity. It is the private bridge between cloud orchestration and local reachability.

Back to glossary browser Open Microsoft Learn source

Aliases: SHIR, self-hosted IR, Data Factory self-hosted runtime
Difficulty: intermediate
CLI mappings: 4
Last verified: 2026-05-23

Browse trail Learn Analytics Data Factory Self-hosted integration runtime

Learning map Learn Data engineering and analytics path Self-hosted integration runtime

Context Learning path: Data engineering and analytics path

Microsoft Learn

A self-hosted integration runtime is customer-managed compute for Azure Data Factory, Azure Synapse, or Microsoft Purview that connects to data sources inside private networks. It runs on a machine you control and enables secure data movement, dispatch, or scanning when cloud services cannot directly reach the source.

Microsoft Learn: Create and configure a self-hosted integration runtime2026-05-23

Technical context

In Azure architecture, a self-hosted integration runtime bridges Azure control-plane orchestration with data sources inside a customer network. The runtime node initiates outbound connections to Azure, receives encrypted work instructions, and then connects locally to source or sink systems. It sits near linked services, datasets, pipeline activities, managed identities, Key Vault secrets, firewalls, and private network routing. Capacity depends on the machine, network path, concurrent jobs, connector behavior, and whether multiple nodes are configured for high availability.

Why it matters

Self-hosted integration runtime matters because many real data estates are not fully cloud-native. Factories, laboratories, branch offices, partner networks, and regulated systems often keep databases behind private firewalls. Without a self-hosted runtime, teams either open risky inbound access, stage files manually, or delay migration. The runtime gives Azure orchestration a controlled way to reach private sources while keeping network ownership with the customer. It also creates an operational responsibility: the host must be patched, monitored, scaled, and recovered. A weak runtime design can become a data-movement bottleneck or a single point of failure. Teams need to treat it as production middleware with ownership.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

You see self-hosted integration runtime in Data Factory Manage under Integration runtimes, where node status, version, keys, and activity indicate readiness for each registered node.

Signal 02

You see it in linked service and pipeline activity configuration when a connector must use that runtime to reach private network sources during deployment review cycles.

Signal 03

You see it in failed pipeline runs when copy activities report unavailable runtime nodes, blocked ports, credential failures, or source connectivity errors during incident triage.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Copy data from on-premises SQL Server, Oracle, SAP, or file shares into Azure without opening inbound firewall rules.
Scan private data sources with Microsoft Purview when sources are reachable only from an internal network or dedicated VM.
Run production Data Factory pipelines against private network sources while keeping credentials and source connectivity under customer control.
Scale out data movement nodes during migration waves where one runtime host cannot handle required copy throughput.
Separate dev, test, and production integration paths so experimental pipelines cannot accidentally use the production runtime host.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Food distributor moves warehouse SQL data without opening inbound ports

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A food distributor needed nightly warehouse inventory from on-premises SQL Server into Azure Data Lake Storage. Network policy prohibited inbound firewall openings from Azure to warehouse systems.

Business/Technical Objectives

Move inventory data to the lake before morning replenishment planning.
Keep warehouse databases reachable only from the private network.
Avoid one runtime host becoming a nightly bottleneck.
Create clear evidence when copy failures occurred.

Solution Using Self-hosted integration runtime

The data platform team installed a self-hosted integration runtime on two hardened warehouse VMs that could reach SQL Server locally and initiate outbound connectivity to Azure Data Factory. Linked services used Key Vault-backed credentials, and pipelines copied data into ADLS Gen2 through private endpoints. Engineers tuned parallel copy settings after measuring source load and runtime CPU. Azure CLI queries exported pipeline-run and activity-run details into the operations dashboard. The network team documented allowed outbound destinations, while the warehouse team owned patch windows for the runtime hosts.

Results & Business Impact

Nightly inventory load time dropped from 5.8 hours to 2.1 hours.
No inbound firewall rules were opened to warehouse SQL Server.
Runtime failover testing proved copies continued when one node was rebooted.
Failure investigation time fell from three hours to 35 minutes using run IDs and host logs.

Key Takeaway for Glossary Readers

Self-hosted integration runtime lets Azure orchestrate private data movement without turning protected source systems into public endpoints.

Case study 02

Pharmaceutical lab scans private research stores with controlled runtime nodes

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A pharmaceutical lab wanted Microsoft Purview scans of research file shares and databases inside a restricted lab network. Direct cloud connectivity was not allowed because the network contained preclinical data.

Business/Technical Objectives

Enable catalog scanning without exposing lab data sources publicly.
Control which hosts could reach research stores.
Schedule scans around experiments that saturated local systems.
Document runtime ownership for internal compliance review.

Solution Using Self-hosted integration runtime

The governance team deployed a self-hosted integration runtime on lab-managed virtual machines inside the restricted network. The runtime was allowed to reach specific file shares and database endpoints, while outbound Azure connectivity was limited to approved services. Purview scans were scheduled during low laboratory activity, and credentials were stored in an approved vault. Operations monitored host health, scan duration, and failed connector events. Access to register or modify runtime nodes was limited to the data governance platform group, while lab system owners approved source-specific permissions.

Results & Business Impact

Purview catalog coverage expanded to 84 percent of prioritized lab stores.
No research source required public exposure or broad inbound firewall access.
Scan-related source load stayed below the lab’s 20 percent threshold.
Compliance reviewers received a documented owner, host list, and access boundary for every runtime node.

Key Takeaway for Glossary Readers

A self-hosted integration runtime is an architectural control point for private data discovery, not just a connector installation step.

Case study 03

Construction analytics team accelerates ERP migration with runtime scale-out

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A construction firm migrated project cost data from a legacy ERP database to Azure. Early Data Factory copies missed weekend migration windows because a single runtime node could not move enough data.

Business/Technical Objectives

Increase copy throughput without overloading the ERP system.
Finish migration waves inside approved weekend windows.
Separate production migration runtime from developer testing.
Build a rollback plan if runtime scale-out caused source pressure.

Solution Using Self-hosted integration runtime

Data engineers added two more self-hosted integration runtime nodes in the same network segment as the ERP database. They tuned copy parallelism, batch size, and schedules after measuring ERP CPU, database waits, and runtime network throughput. A separate development runtime was created so test pipelines could not compete with migration jobs. Azure CLI exports captured pipeline activity durations and failure reasons after each weekend wave. The operations runbook defined node restart order, source throttling limits, and rollback steps if ERP performance crossed agreed thresholds.

Results & Business Impact

Average migration-wave duration fell from 19 hours to 7.5 hours.
ERP CPU stayed below the agreed 65 percent ceiling during copies.
Developer test jobs stopped consuming production runtime capacity.
The final three migration waves completed without missed Monday reporting deadlines.

Key Takeaway for Glossary Readers

Self-hosted integration runtime performance is an architecture decision involving host capacity, source protection, and controlled pipeline concurrency.

Why use Azure CLI for this?

I use Azure CLI for self-hosted integration runtime work because the portal does not scale across factories, regions, and environments. CLI helps me list data factories, inspect linked runtimes, export pipeline activity, and validate whether a factory deployment references the expected integration runtime. After ten years of Azure data projects, I treat runtime configuration as infrastructure, not a click path. CLI is especially useful for comparing dev, test, and production, checking resource group ownership, gathering failed pipeline evidence, and proving that a change targeted the correct factory before touching private network dependencies. It also gives incident teams stable run IDs that match activity logs and host investigations.

CLI use cases

List data factories and confirm which environments should contain self-hosted integration runtime configuration.
Inspect pipeline runs and activity failures to identify whether errors come from runtime availability, credentials, or source connectivity.
Export factory configuration before changing linked services or deploying pipelines that depend on a private network runtime.
Compare dev, test, and production factories to catch accidental references to the wrong integration runtime name.

Before you run CLI

Confirm tenant, subscription, resource group, factory name, region, and whether your command is read-only or changes data integration resources.
Check permissions for Data Factory, linked services, Key Vault, and any private source credentials before assuming a runtime problem.
Understand the host machine, network route, firewall rules, source system owner, and maintenance window before modifying production pipelines.
Use UTC time windows for pipeline-run queries and capture output as JSON so activity failures can be compared accurately.

What output tells you

Factory output confirms the resource group, region, identity, and tags for the orchestration resource that owns runtime configuration.
Pipeline-run and activity output shows whether failures came from copy activity, linked service resolution, credential access, or runtime connectivity.
Integration-runtime status, when available, indicates node health, runtime version, registration state, and whether jobs can be accepted.
Timestamps, run IDs, and activity names let operators correlate Azure orchestration with local host logs, network changes, and source-system events.

Mapped Azure CLI commands

Inspect Data Factory runtimes and private pipeline failures

operates

az datafactory integration-runtime list --factory-name <factory-name> --resource-group <resource-group>

az datafactory integration-runtimediscoverAnalytics

az datafactory integration-runtime show --factory-name <factory-name> --resource-group <resource-group> --name <runtime-name>

az datafactory integration-runtimediscoverAnalytics

az datafactory integration-runtime self-hosted create --factory-name <factory-name> --resource-group <resource-group> --name <runtime-name>

az datafactory integration-runtime self-hostedprovisionAnalytics

az datafactory pipeline-run query-by-factory --factory-name <factory-name> --resource-group <resource-group> --last-updated-after <utc> --last-updated-before <utc>

az datafactory pipeline-rundiscoverAnalytics

Architecture context

A seasoned Azure architect places a self-hosted integration runtime close to the systems it must reach, not simply wherever a spare VM exists. The design considers outbound connectivity to Azure, local database latency, firewall rules, service account permissions, Key Vault secrets, node count, patching, and recovery. For production data movement, I expect at least two nodes when availability matters, documented connector ownership, and monitoring of CPU, memory, queueing, and copy throughput. The runtime is part of the integration architecture: pipelines depend on it, source teams trust it, and security teams audit it because it can reach private data. Placement and ownership should be decided before pipelines become business-critical.

Security

Security impact is direct because the runtime can reach private data sources that Azure services cannot access directly. The host should run on a hardened machine, use least-privilege credentials, protect keys, and avoid broad local administrator access. Secrets should be stored in Key Vault or approved credential stores, not scattered across scripts. Network rules should allow only required outbound Azure connectivity and local source access. A compromised runtime host can become a bridge into sensitive databases. Teams should monitor host health, update the runtime, review linked service credentials, and restrict who can register or change nodes. Registration keys and local service accounts should be rotated through a controlled process.

Cost

The runtime resource itself is not the only cost to consider. Customer-managed compute, operating system licensing, monitoring, antivirus, backup, patching, and administrator time all contribute. Data movement can also create network egress or private connectivity costs depending on source, sink, and route. Underpowered hosts increase pipeline duration and operator labor; oversized hosts waste infrastructure budget. High availability adds node cost but may prevent expensive reporting or migration delays. FinOps review should compare runtime utilization, copy throughput, job concurrency, and maintenance effort against business criticality and data-movement schedules. Idle nodes should be reviewed, but fragile savings can delay reporting during incidents.

Reliability

Reliability impact is direct for pipelines that depend on private sources. If the self-hosted runtime host is offline, overloaded, unpatched, or blocked by a firewall, copy activities and scans fail even when Azure Data Factory is healthy. Production designs should use multiple nodes where needed, stable host sizing, monitored service status, and documented restart procedures. Network changes, password rotations, and certificate updates should be tested because they often appear as pipeline failures. Reliable operation also includes queue monitoring, retry behavior, maintenance windows, and a recovery plan if the host VM or on-premises server is lost. Health alerts should fire before business reporting misses its service window.

Performance

Performance impact is direct because data movement flows through the runtime host and its network path. Throughput depends on CPU, memory, disk, connector implementation, source throttling, sink capacity, latency, and parallel copy settings. A self-hosted runtime placed far from the source can turn a simple copy into a slow cross-network transfer. Operators should monitor copy duration, queue time, and host resource pressure during peak loads. Scaling out nodes can help, but only when sources, sinks, and pipelines can use parallelism safely. Performance tuning must respect database load, firewall limits, and maintenance windows. Baseline tests should be repeated after connector, network, or schema changes.

Operations

Operators manage self-hosted integration runtime by checking node status, version, CPU, memory, concurrent jobs, service connectivity, and pipeline failures. Common work includes registering nodes, rotating credentials, updating the runtime, scaling out nodes, troubleshooting connector errors, and coordinating firewall changes with network teams. They also compare factory configuration across environments to ensure production pipelines are not accidentally using a development runtime. Good operations document the host owner, patch window, service account, linked services, and recovery steps. Incident review should separate Azure service health from local host, network, and source-system problems. Evidence should include Azure run IDs and local service logs together.

Common mistakes

Installing the runtime on one unmonitored server and treating it like a highly available production service.
Opening inbound firewall rules to private sources instead of using the runtime’s outbound connection pattern.
Using the same runtime for development experiments and production pipelines without isolation or change control.
Blaming Azure Data Factory for failures caused by local DNS, expired credentials, blocked ports, or an offline host service.
Forgetting to patch, update, back up, and capacity-plan the runtime host after the first successful copy.

Operator quick checks

Check runtime node status and version before troubleshooting pipeline code.
Run a small source connectivity test before starting a large migration copy.
Confirm linked services use the intended runtime name in the correct factory environment.
Review host CPU, memory, network, and service logs during slow or failed copy windows.

Questions to ask

Which private systems can this runtime reach, and who approved that access?
What happens to production pipelines if the host VM is patched, rebooted, or lost?
Is the runtime sized for peak copy throughput or only for the first test run?
How are credentials rotated without breaking linked services and scheduled pipelines?
Which monitoring tells us the runtime is unhealthy before business reports miss their deadline?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learning paths

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph