Bulk executor - Azure Glossary

Microsoft Learn

Bulk executor is the Azure Cosmos DB capability or library pattern for sending many create, update, patch, delete, or import operations efficiently instead of processing documents one at a time. Microsoft Learn places it in Azure Cosmos DB bulk executor library overview; operators confirm scope, configuration, dependencies, and production impact.

Microsoft Learn: Azure Cosmos DB bulk executor library overview2026-05-12

Technical context

Technically, bulk executor refers to Cosmos DB bulk operation support in SDKs and older bulk executor libraries. The mechanism groups many operations, uses partition key ranges, handles retries, and adapts concurrency so provisioned RU/s are used efficiently while throttling is respected. Supported behavior depends on API, SDK version, account type, operation, partitioning, and consistency settings. Operators inspect RU consumption, 429 responses, latency, item counts, partition distribution, client logs, and job checkpoints. Modern SDK bulk support is usually preferred over legacy library versions.

Why it matters

Bulk executor matters because large Cosmos DB data changes can either complete predictably or create expensive production pain. Without a bulk pattern, migration teams may run slow serial loops, underuse purchased throughput, overload a few logical partitions, or fail after partial writes without checkpoints. A well-designed bulk executor job can load terabytes, repair documents, or reprocess customer data while respecting RU limits and service health. It also gives operators evidence: operation counts, retries, throughput, failed items, and restart position. The business value is faster migration, shorter maintenance windows, less manual cleanup, and more confidence in data platform changes. That evidence keeps accountability clear.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In Cosmos DB migration tools and SDK code, bulk executor appears as bulk operation configuration, concurrency settings, item batches, and partition-aware write loops. for governance and incident response.

Signal 02

In Azure Monitor metrics, bulk executor evidence appears as high RU consumption, 429 throttling, write latency, request counts, and partition hot spots. for governance and incident response.

Signal 03

In runbooks and job logs, it appears through processed counts, failed item reports, checkpoints, retry summaries, elapsed time, and target container details. for governance and incident response.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Check Cosmos DB account, database, and container throughput before a bulk job.
Capture RU, throttling, and latency evidence while the job is running.
Verify item counts and failed-operation reports after migration or repair.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Bulk executor for loyalty data migration

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Northstar Rewards, a travel loyalty provider, needed to migrate 450 million customer activity records into Azure Cosmos DB for NoSQL before a new mobile app launch.

Business/Technical Objectives

Complete migration within a 36-hour cutover window
Keep 429 throttling below an approved operating band
Preserve checkpoints for every partition range
Reconcile source and target counts before launch

Solution Using Bulk executor

The data platform team built a Cosmos DB bulk executor pipeline using modern SDK bulk operation support, partition-aware batching, adaptive concurrency, and checkpoint files stored in Blob Storage. They prevalidated partition keys, provisioned temporary RU headroom, and sent metrics to Azure Monitor for request rate, RU consumption, latency, and throttling. Failed documents were written to a quarantine container with reason codes. Operators could pause or resume the job without replaying completed partitions, and FinOps scheduled the throughput increase to revert automatically after reconciliation. The team reviewed results with application owners before closing the change record. Support notes documented rollback ownership and business approval.

Results & Business Impact

Migration completed in 29 hours instead of the 36-hour limit
Source-to-target reconciliation reached 99.998 percent before launch
Temporary throughput was reduced on schedule, avoiding idle spend
Failed-item repair finished within one business day

Key Takeaway for Glossary Readers

Bulk executor patterns make large Cosmos DB data movement safer when throughput, partitions, checkpoints, and reconciliation are designed together.

Case study 02

Bulk executor for healthcare record repair

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

MedAxis Clinics, a healthcare network, found inconsistent status fields across Cosmos DB documents after a legacy import and needed to repair records without downtime.

Business/Technical Objectives

Patch 62 million affected documents safely
Avoid starving patient portal traffic
Record every failed or skipped item
Complete repair before a compliance reporting deadline

Solution Using Bulk executor

Engineers created a bulk executor repair job that queried candidate documents, grouped operations by partition key, and issued patch operations with controlled concurrency. The job ran during lower-traffic windows and watched Cosmos DB metrics for RU usage, latency, 429 responses, and portal dependency timing. A dry run wrote expected updates to an audit container, and production execution wrote checkpoints every five minutes. Operators had a stop command if portal latency exceeded the threshold, and the team preserved failed-item records for privacy-reviewed manual correction. The team reviewed results with application owners before closing the change record. Support notes documented rollback ownership and business approval. Risks were reviewed. Evidence remained available for later audit review.

Results & Business Impact

Record repair finished in 11 hours with no portal outage
Portal P95 latency stayed within the approved threshold
Failed-item rate remained below 0.04 percent
Compliance reporting was delivered two days early

Key Takeaway for Glossary Readers

Bulk executor is valuable for controlled production data repair when the job respects live workload health and audit requirements.

Case study 03

Bulk executor for manufacturing telemetry backfill

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Pioneer Robotics, a manufacturing automation firm, needed to backfill factory telemetry into Cosmos DB after adding a new analytics model for predictive maintenance.

Business/Technical Objectives

Load six months of telemetry without delaying live ingestion
Use available RU capacity across all partition ranges
Keep analytics dashboards accurate during backfill
Document cost and performance impact for operations

Solution Using Bulk executor

The analytics team implemented a bulk executor process that read compressed telemetry batches from Data Lake, transformed them into Cosmos DB items, and used SDK bulk create operations with partition-aware batching. They separated live ingestion from backfill containers, then merged dashboards after validation. Azure Monitor tracked RU utilization, ingestion latency, 429s, and data freshness, while checkpoints captured source file offsets. Operators used a runbook to throttle the backfill when live telemetry lag increased and to resume once factory traffic returned to normal. The team reviewed results with application owners before closing the change record. Support notes documented rollback ownership and business approval.

Results & Business Impact

Backfill finished in 18 hours instead of an estimated 72 hours
Live ingestion lag stayed below the 10-minute target
RU utilization averaged 82 percent during the approved window
Predictive maintenance dashboards gained six months of history

Key Takeaway for Glossary Readers

Bulk executor helps teams use Cosmos DB throughput efficiently while keeping long-running backfills observable and recoverable.

Why use Azure CLI for this?

Use CLI, SDK logs, metrics queries, and exported job reports for bulk executor work because portal views alone rarely explain throughput, retries, and partial completion.

CLI use cases

Check Cosmos DB account, database, and container throughput before a bulk job.
Capture RU, throttling, and latency evidence while the job is running.
Verify item counts and failed-operation reports after migration or repair.

Before you run CLI

Confirm tenant, subscription, scope, resource group, region, and environment before collecting or changing production evidence.
Use least-privileged access and avoid exposing keys, tokens, personal data, billing details, or confidential topology in output.
Know whether the command is read-only, mutating, cost-impacting, or security-impacting before running it in production.

What output tells you

Output confirms whether the live configuration exists at the expected Azure scope and matches the approved design.
Returned properties, metrics, or logs help separate healthy service behavior from drift, missing configuration, or workload symptoms.
Differences between environments provide evidence for rollback, tuning, support escalation, audit review, or owner follow-up.

Mapped Azure CLI commands

Cosmosdb operations

direct

az cosmosdb list --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb show --name <account-name> --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb create --name <account-name> --resource-group <resource-group> --locations regionName=<region>

az cosmosdbprovisionDatabases

az cosmosdb sql database list --account-name <account-name> --resource-group <resource-group>

az cosmosdb sql databasediscoverDatabases

az cosmosdb sql container list --account-name <account-name> --database-name <database-name> --resource-group <resource-group>

az cosmosdb sql containerdiscoverDatabases

az cosmosdb delete --name <account-name> --resource-group <resource-group>

az cosmosdbremoveDatabases

Architecture context

Bulk executor matters because large Cosmos DB data changes can either complete predictably or create expensive production pain. Without a bulk pattern, migration teams may run slow serial loops, underuse purchased throughput, overload a few logical partitions, or fail after partial writes without checkpoints. A well-designed bulk executor job can load terabytes, repair documents, or reprocess customer data while respecting RU limits and service health. It also gives operators evidence: operation counts, retries, throughput, failed items, and restart position. The business value is faster migration, shorter maintenance windows, less manual cleanup, and more confidence in data platform changes. That evidence keeps accountability clear.

Security

Security for bulk executor work starts with the identity or key that can change large amounts of data. Use least-privileged access, managed identity where supported by the application path, protected connection strings, and clear approval before bulk updates or deletes. Input files, exported documents, logs, and failed-item reports can contain customer data, so protect storage locations and retention. Review network controls, private endpoints, firewall rules, and who can run the job. A bulk executor script should never be an unreviewed operator laptop tool for production. Include change tickets, dry runs, backups or restore points, and separation between read-only validation and mutating execution.

Cost

Cost is significant because bulk executor work is designed to consume throughput aggressively. Provisioned RU/s, autoscale maximums, data storage, multi-region writes, diagnostics, and retry storms can all affect spend. A slow serial job wastes engineer time, while an untuned bulk job can push RU usage to expensive peaks. Plan the job around expected document count, item size, partition distribution, and acceptable duration. Temporarily increasing throughput can be cost-effective if it shortens a migration window, but only if it is reduced afterward. Measure consumed RU, failed retries, and post-job storage growth so FinOps can explain the bill. Review outcomes after each billing cycle.

Reliability

Reliability depends on partition-aware design, checkpoints, retry handling, and safe recovery after partial completion. Bulk jobs should handle 429 throttling, transient failures, duplicate submissions, poison documents, and client restarts. Use idempotent operation design where possible, record processed counts, and separate validation from mutation. Test the job against realistic partition distribution and RU limits before production. Operators should know when to pause, resume, or roll back. Cosmos DB is highly available, but a client-side bulk process can still fail from bad data, hot partitions, network interruption, or exhausted throughput. Durable checkpoints reduce fear during long migrations. Test the recovery path regularly.

Performance

Performance depends on partition key distribution, item size, network latency, SDK version, client CPU, concurrency, and available RU/s. Bulk executor patterns perform best when work is spread across partitions and the client adapts to throttling instead of fighting it. Watch for hot partitions, oversized documents, synchronous loops, excessive logging, and retry policies that amplify pressure. Operators should measure documents per second, RU per operation, latency percentiles, 429 rate, and total elapsed time. Tuning should improve predictable completion, not just peak speed. A job that finishes fast but starves production traffic is not a successful performance outcome. Document baseline measurements before tuning.

Operations

Operationally, a bulk executor job needs a runbook, schedule, owner, expected duration, RU plan, and backout strategy. Capture source data location, target database and container, partition key assumptions, SDK version, concurrency settings, retry policy, and checkpoint file path. During execution, monitor RU consumption, request rate, latency, errors, 429s, and application impact. Pause competing workloads or increase throughput only through approved change windows. After completion, reconcile item counts, sample documents, failed operations, and cost. Good operations treat bulk execution as a controlled production change, not just a developer script that happens to use Cosmos DB. Keep owners and evidence current. Keep owners and evidence current.

Common mistakes

Running bulk operations without checkpoints, then guessing which documents were already changed.
Ignoring partition key distribution and blaming Cosmos DB for hot partition throttling.
Leaving temporary throughput increases enabled after the bulk job finishes.