Compute Serverless workflows premium field-manual-complete

Durable Functions

Durable Functions lets developers write workflows as code instead of building a custom state machine. A normal function handles one short job, but a durable workflow can coordinate steps, wait for timers, call activity functions, retry failures, fan work out, gather results, and remember progress. It is useful when a process lasts minutes, hours, or days and must survive restarts. The orchestrator describes the process, while Azure stores enough history to resume reliably. Treat orchestration history as operational state, not disposable logging.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure Durable Functions, Durable orchestration, Durable workflow
Difficulty: intermediate
CLI mappings: 5
Last verified: 2026-05-30

Microsoft Learn

Microsoft Learn describes Durable Functions as an Azure Functions extension for building stateful workflows in a serverless environment. Developers write orchestrator, activity, and entity functions while the runtime manages checkpoints, state, retries, timers, recovery, and long-running operations across restarts and long delays.

Microsoft Learn: Durable Functions overview2026-05-30

Technical context

Technically, Durable Functions is an extension on top of Azure Functions and the Durable Task framework. It uses orchestrator functions for deterministic workflow logic, activity functions for units of work, entity functions for durable state, and a task hub backed by storage or a supported provider. It interacts with Function Apps, triggers, bindings, app settings, Application Insights, storage accounts, hosting plans, and language worker models. State, timers, and checkpoints connect function code with the backing task hub. Support teams need instance IDs to join code behavior with platform evidence.

Why it matters

Durable Functions matters because long-running business processes are easy to start and hard to operate correctly. Without durable orchestration, teams often create fragile combinations of queues, timers, database flags, and manual retry scripts. That becomes painful when a step fails after several successful actions or waits overnight for approval. Durable Functions gives developers checkpoints, replay, retries, status tracking, and timers. Operators can answer which instance is running, where it stopped, what will retry, and whether failure is code, dependency, or state. Long-running work should survive restarts without forcing operators to invent manual tracking. Workflow state deserves ownership because it controls retries, compensation, and recovery.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Function App code, durable orchestrator, activity, and entity functions appear beside triggers and bindings that start or query workflow instances. inside repositories inside repositories

Signal 02

In Application Insights, orchestration instance IDs, dependency calls, failures, retries, and custom traces reveal how a workflow progressed or stalled. during investigations during investigations during investigations

Signal 03

In app settings and storage, task hub names, storage connections, runtime versions, and extension configuration explain which state backend is active. during deployment reviews during deployment reviews

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Coordinate multi-step approval, document, or onboarding workflows that must wait and resume.
Fan out many independent tasks, collect results, and continue after the batch completes.
Implement reliable retry and compensation around flaky partner APIs.
Track long-running customer processes through instance IDs that support staff can query.
Replace brittle queue-and-timer glue code with code-first orchestration.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Coordinating claims

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

An insurance carrier processed property claims through email, spreadsheets, and short functions. Adjusters could not tell whether a claim waited on extraction, fraud review, or approval.

Business/Technical Objectives

Create one traceable workflow for each claim.
Retry analysis steps without duplicating approvals.
Reduce manual status-check emails.
Keep workflows alive while customers submit evidence.

Solution Using Durable Functions

Developers replaced scattered queue logic with Durable Functions. An HTTP starter created an orchestration instance for each claim. Activity functions called document extraction, image review, policy lookup, and fraud scoring services. Timers waited for missing customer documents, and human approval arrived as an external event. Retry policies handled partner API failures. The instance ID was stored with the claim record so support staff could query progress. Application Insights tracked activities, dependencies, and failures by workflow instance. The team documented owners, rollback steps, monitoring signals, and approval notes so support staff could explain the change during incidents. Support teams also received a short guide mapping claim IDs to orchestration instance IDs during escalations. Claims supervisors received retry status training.

Results & Business Impact

Claim status investigation time dropped from 38 minutes to under 6.
Duplicate payout-review tasks fell 89 percent.
Document waits no longer required manual reminder spreadsheets.
Adjuster satisfaction scores improved 23 points.

Key Takeaway for Glossary Readers

Durable Functions is valuable when a process must remember progress, wait safely, and explain state.

Case study 02

Recovering from carrier outages

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A freight broker coordinated pickup quotes across 18 carrier APIs. Temporary partner outages caused half-completed bookings, duplicate emails, and manual dispatcher cleanup.

Business/Technical Objectives

Coordinate quote, selection, booking, and notification steps.
Retry transient partner failures without duplicate bookings.
Escalate stalled shipments after defined waiting periods.
Cut after-hours engineering interventions by 50 percent.

Solution Using Durable Functions

The team built a Durable Functions orchestration for each shipment request. Activity functions called carrier quote APIs in parallel, normalized responses, selected eligible carriers, and attempted booking with idempotency keys. Retry policies handled rate limits and temporary outages, while timers escalated requests that waited too long. Compensation activities canceled partial reservations when later steps failed. Dispatchers received a dashboard link backed by orchestration status instead of asking engineers to read logs. Before-and-after command output was attached to the ticket, giving auditors a clear view of scope, permissions, and expected behavior. Dispatch supervisors received dashboard links so they could verify delayed updates without calling engineering. Carrier managers reviewed exception queues every morning during rollout.

Results & Business Impact

After-hours pages related to carrier outages dropped 63 percent.
Duplicate booking incidents decreased from 17 per month to 2.
Average quote completion time improved 31 percent.
Dispatcher escalation happened automatically after 12 minutes.

Key Takeaway for Glossary Readers

Durable Functions helps integration systems survive partial failure without a custom workflow engine.

Case study 03

Orchestrating sample processing

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A genomics lab processed sequencing samples with many independent quality checks and enrichment steps. Scientists lost hours when one step failed after dozens had completed.

Business/Technical Objectives

Fan out sample checks and gather results safely.
Provide scientists with clear status for each batch.
Resume processing after host restarts or dependency failures.
Reduce manual reruns that consumed expensive compute.

Solution Using Durable Functions

Engineers created a Durable Functions pipeline that started an orchestration for each sample batch. The orchestrator fanned out activity functions for quality checks, metadata validation, reference matching, and enrichment tasks, then gathered results for a final decision. Failed activities retried with backoff, while deterministic orchestration rules kept replay safe. Batch IDs and orchestration instance IDs were stored in the lab portal. Dashboards showed failed activities, backlog, duration, and dependency errors. The runbook included validation commands, rollback timing, and owner contacts so the handoff remained clear after the project team left. Researchers also received retry guidance so a failed activity did not restart the entire analysis unnecessarily.

Results & Business Impact

Manual reruns dropped 76 percent because completed activity results were preserved.
Average batch completion time improved from 9.4 hours to 5.8 hours.
Support emails from scientists fell 68 percent.
Repeated enrichment compute waste fell about $14,000 per quarter.

Key Takeaway for Glossary Readers

Durable Functions gives analytical pipelines durable coordination without forcing scientists to operate a custom scheduler.

Why use Azure CLI for this?

I use Azure CLI around Durable Functions because the workflow is code, but production evidence lives in Azure resources. CLI confirms the Function App, hosting plan, app settings, storage connection, runtime version, deployment slot, and Application Insights link before troubleshooting orchestration behavior. It also helps restart apps, view logs, check keys, and validate deployment state in a repeatable way. There is no single command for every durable instance task, so engineers combine CLI with telemetry and storage inspection. CLI helps inspect app settings, deployment slots, identity, and runtime configuration repeatably. The useful investigation joins function host data with instance-level telemetry. Scripts collect platform evidence around orchestration failures faster than portal clicking.

CLI use cases

Show the Function App configuration hosting orchestrations, including runtime, identity, plan, and app settings.
List app settings to confirm task hub names, storage references, and extension configuration.
Restart a Function App during a controlled incident after checking workflow impact.
Inspect Application Insights configuration so failures and dependencies are visible.
Compare deployment slots before swapping durable workflow code with running instances.

Before you run CLI

Confirm tenant, subscription, resource group, Function App, deployment slot, plan, and storage account.
Check whether the workflow has running instances before restarts or swaps.
Verify permissions to read settings, logs, storage configuration, and telemetry without exposing secrets.
Plan cost and performance impact before increasing concurrency or telemetry volume.
Use output queries that avoid printing connection strings or keys.

What output tells you

Runtime and extension settings show whether the host can run expected Durable Functions code.
App settings reveal task hub names, storage references, and environment-specific values.
Plan and worker fields show whether cold start or scale limits may influence latency.
Telemetry configuration tells whether failures, retries, dependencies, and custom traces are visible.

Mapped Azure CLI commands

Durable Functions operational commands

adjacent

az functionapp show --resource-group <resource-group> --name <function-app>

az functionappdiscoverCompute

az functionapp config appsettings list --resource-group <resource-group> --name <function-app>

az functionapp config appsettingsdiscoverWeb

az functionapp restart --resource-group <resource-group> --name <function-app>

az functionappoperateCompute

az monitor app-insights component show --app <app-insights-name> --resource-group <resource-group>

az monitor app-insights componentdiscoverAI and Machine Learning

az webapp deployment slot list --resource-group <resource-group> --name <function-app>

az webapp deployment slotdiscoverCompute

Architecture context

Architecturally, Durable Functions sits between simple event handlers and full workflow platforms. I choose it when developers need code-first orchestration, durable state, fan-out and fan-in, human interaction, or reliable retries without running a custom workflow engine. Logic Apps may be better for connector-heavy business automation, and AKS or Batch may suit specialized compute. A good durable architecture includes task hub naming, storage reliability, telemetry, idempotent activities, replay-aware coding, and versioning strategy. Orchestrations should model durable business processes, not every small code path. The workflow boundary becomes part of application architecture and support ownership. Task hub design, storage account choice, and idempotent activities need review. Durable Functions architecture context evidence should be captured before approval 1.

Security

Security impact is direct because durable workflows often coordinate sensitive operations across systems. Orchestrations may reference customer records, approval states, payment steps, document IDs, or privileged automation. Secrets should live in Key Vault or managed identity integrations, not copied app settings. Operators must secure the Function App, storage account, Application Insights data, HTTP starter endpoints, and APIs that expose status. Orchestration history can contain business identifiers, so retention and data access need review. Secrets and connection strings should never be hardcoded inside orchestration code. Instance history can contain sensitive business identifiers, so retention matters. Managed identity and Key Vault reduce exposure around storage and downstream calls.

Cost

Cost depends on hosting plan, execution count, duration, storage transactions, telemetry volume, and downstream calls. Durable workflows can be cheaper than always-on workers for intermittent processes, but high-volume orchestrations with chatty activities can create significant storage and logging cost. Fan-out patterns may multiply activity executions quickly. Long retention of history and verbose telemetry can surprise teams after launch. Review orchestration count, average duration, retries, storage transactions, plan type, and sampling. Storage transactions, function executions, and long retention can accumulate quietly. Purge policies should match audit needs without keeping noisy history forever. High fan-out workflows need cost tests before production volume arrives. Durable Functions cost evidence should be captured before approval 1.

Reliability

Reliability is the main reason to use Durable Functions. The runtime checkpoints progress so workflows can continue after host restarts, transient dependency failures, or long waits. Reliability still depends on deterministic orchestrator code, idempotent activity functions, appropriate retry policies, healthy storage, and clear timeout behavior. A bad activity can repeat unsafe side effects, and a noisy task hub can create backlog. Monitor failed orchestrations, pending activities, throttling, duration, and poison patterns. Checkpointing is valuable only when activities are idempotent and retries are designed. Disaster recovery must account for the backing storage that holds orchestration state. Poisoned activities and stuck timers need dashboards and repair procedures. Durable Functions reliability evidence should be captured before approval 1.

Performance

Performance is shaped by orchestration pattern, activity duration, storage provider behavior, hosting plan, concurrency settings, and dependency latency. Durable Functions can coordinate high parallel fan-out work, but replay and checkpointing add overhead compared with an in-memory process. Consumption hosting may introduce cold starts, while Premium or Dedicated hosting improves predictability. Operators should monitor queue backlog, activity time, orchestration latency, storage throttling, and retries before tuning concurrency. Fan-out, polling, replay behavior, and storage latency shape workflow throughput. Performance tuning starts with instance history, not random host scaling. Slow activities should be separated from orchestrator logic and measured independently. Durable Functions performance evidence should be captured before approval 1.

Operations

Operators support Durable Functions by tracking orchestration instance IDs, status endpoints, Function App health, app settings, storage dependencies, logs, and telemetry. Common jobs include finding stuck instances, explaining retries, restarting a Function App safely, validating task hub names after deployment, and correlating a business request with orchestration history. Deployment discipline matters because changing orchestrator code while instances are running can create versioning surprises. Operators should capture instance IDs, runtime version, task hub, and storage status. Runbooks should define when to terminate, rewind, purge, or restart instances. Replay-safe logging prevents confusing duplicate messages during investigations. Durable Functions operations evidence should be captured before approval 1.

Common mistakes

Writing non-deterministic orchestrator code that uses current time, random values, or direct I/O.
Making activity functions non-idempotent and repeating payments, emails, or updates on retry.
Using the same task hub name across environments or slots.
Deploying breaking orchestrator changes while old instances are still running.
Logging sensitive business data into orchestration history without retention review.

Operator quick checks

Confirm task hub name and storage account match the intended environment.
Check Application Insights for failed orchestrations, slow activities, dependency errors, and retries.
Review running instances before swapping slots, restarting, or deploying new orchestrator versions.
Verify activity functions can safely retry without duplicate external side effects.
Compare hosting capacity with orchestration volume, backlog, and latency targets.

Questions to ask

Which business process does this orchestration represent?
What happens if an activity succeeds externally but fails internally?
How are running instances protected during deployment?
Which dependencies must be healthy for continuation?
What metric proves the workflow is delayed, failed, retrying, or waiting?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph