Databases Azure Cosmos DB top-250-pre130-priority-upgraded launch-ready field-manual-complete

Continuous backup

Continuous backup is the Azure Cosmos DB backup mode that keeps restore points available over time instead of relying only on periodic snapshots. It helps teams recover from accidental deletes, bad writes, or container changes by restoring data to a selected time within the configured retention tier. In plain English, it is your safety net for operational Cosmos DB data. You still need permissions, restore planning, and validation, but the service keeps backup data in the background without consuming provisioned request units from the live workload.

Aliases
Azure Cosmos DB continuous backup, Cosmos DB point-in-time restore, continuous backup mode
Difficulty
intermediate
CLI mappings
4
Last verified
2026-06-03

Microsoft Learn

Continuous backup in Azure Cosmos DB provides point-in-time restore capability for supported APIs, allowing recovery of accounts, databases, or containers within the configured retention tier.

Microsoft Learn: Continuous backup with point-in-time restore in Azure Cosmos DB2026-06-03

Technical context

Technically, continuous backup is configured on an Azure Cosmos DB account and supports point-in-time restore for selected APIs such as NoSQL, MongoDB, Gremlin, and Table. Restores can target another account, and same-account restore is available for deleted databases or containers in supported scenarios. Backups are maintained in regions where the account exists, with restorable timestamps used to choose a recovery point. Operators must understand retention tier, region, consistency, permissions, unsupported configurations, and what settings are not restored with data.

Why it matters

Continuous backup matters because database incidents rarely happen exactly before a scheduled backup. A developer may delete a container, an application bug may overwrite documents, or a migration may corrupt a subset of data. Point-in-time restore gives teams a practical way to recover a known good state without stopping the entire business. It also changes incident response: teams can investigate timestamps, restore into a safe account, compare data, and decide what to copy back. That reduces panic, shortens recovery, and creates stronger evidence for audits and post-incident reviews. It should be reviewed with real users, clear ownership, and measurable service outcomes before being treated as mature production design.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure Cosmos DB account Backup & Restore or Point In Time Restore panes, continuous backup appears with retention tier and restore options during daily operations and audits.

Signal 02

In CLI or ARM output, it appears under backupPolicy with type Continuous, continuousModeProperties, retention tier, migration state, and restorable account metadata during daily operations and audits.

Signal 03

In incident runbooks, signals include latest restorable timestamp, delete event time, source region, target account name, restore permissions, and validation checklist status during daily operations and audits.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Enable point-in-time restore for Cosmos DB workloads with tighter recovery needs.
  • Investigate accidental deletes, bad writes, or corruption using restorable account evidence.
  • Choose continuous mode when restore precision matters more than simpler backup settings.
  • Document restore authority, target account naming, networking, and validation queries.
  • Compare continuous backup with periodic backup during architecture and compliance reviews.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Retail catalog rollback

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

BrightTrail Market, a grocery delivery company, discovered that a pricing deployment overwrote thousands of Cosmos DB catalog items with incorrect promotional values.

Business/Technical Objectives
  • Restore the catalog to the minute before the bad deployment
  • Avoid interrupting checkout for unaffected regions
  • Validate recovered prices before cutover
  • Preserve incident evidence for finance and audit review
Solution Using Continuous backup

The database team used continuous backup to identify a UTC restore point just before the deployment. They restored the affected Cosmos DB account into a new account in the approved recovery resource group, then compared item counts, partition distribution, and price fields against deployment logs. Application traffic stayed on the existing account while analysts validated the restored catalog. After approval, a controlled repair job copied corrected documents back to production. The restored account used private endpoints, diagnostic settings, and restricted RBAC before analysts accessed any data. The team also documented owners, rollback steps, dashboards, and escalation paths so support staff could handle exceptions without redesigning the solution.

Results & Business Impact
  • Incorrect prices were corrected within 74 minutes
  • Checkout availability remained above 99.9 percent during recovery
  • Finance received a documented restore timestamp and comparison report
  • The temporary account was deleted after validation to prevent extra spend
Key Takeaway for Glossary Readers

Continuous backup gives teams a safe comparison copy so they can repair data precisely instead of rushing a risky full cutover.

Case study 02

Banking customer profile recovery

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Meridian Savings, a regional bank, had a batch process that accidentally cleared preference documents for a subset of digital banking customers.

Business/Technical Objectives
  • Recover affected profiles without rolling back valid transactions
  • Meet a two-hour internal recovery objective
  • Restrict restored historical data to approved responders
  • Create a repeatable playbook for future data incidents
Solution Using Continuous backup

The operations team used Cosmos DB logs and application telemetry to find the last safe timestamp. Continuous backup restored the account to a new isolated account, where a comparison script identified only the customer preference documents changed by the failed batch. The team used managed identities and a private endpoint for the restore environment, with access limited to the incident group. Correct preference records were copied back through a controlled data repair pipeline, and the restored account remained locked until audit evidence was exported. The team also documented owners, rollback steps, dashboards, and escalation paths so support staff could handle exceptions without redesigning the solution.

Results & Business Impact
  • Customer preferences were repaired in 96 minutes
  • No valid post-incident transactions were overwritten
  • Access review showed only four responders could read restored data
  • The new playbook reduced future restore drill time by 42 percent
Key Takeaway for Glossary Readers

Continuous backup supports selective recovery when the business needs historical data without undoing every legitimate write after the incident.

Case study 03

Healthcare container restore drill

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Pine Valley Health operated a patient engagement platform and wanted proof that accidental deletion of a Cosmos DB container could be recovered during an on-call shift.

Business/Technical Objectives
  • Demonstrate same-account recovery for a deleted nonproduction container
  • Confirm required permissions before production rollout
  • Measure restore and validation time with realistic data
  • Document which settings must be reapplied after restore
Solution Using Continuous backup

The platform team enabled continuous backup on a nonproduction Cosmos DB account that mirrored production design. During a scheduled drill, an engineer deleted a test container and responders used the same-account restore workflow to recover it. They captured the deletion event time, selected the restore point, monitored the recovery, and validated partition keys, item counts, indexing behavior, and application reads. The drill also showed that networking and access policies required separate verification. Findings were added to the production runbook with screenshots, CLI commands, and approval steps. The team also documented owners, rollback steps, dashboards, and escalation paths so support staff could handle exceptions without redesigning the solution.

Results & Business Impact
  • The container was restored and validated in 38 minutes
  • Missing permission assignments were found before production adoption
  • The runbook listed six settings requiring post-restore checks
  • On-call engineers gained a tested recovery path for accidental deletion
Key Takeaway for Glossary Readers

Continuous backup is strongest when teams practice the restore path before a real deletion forces them to learn under pressure.

Why use Azure CLI for this?

Use CLI for continuous backup because restore decisions need repeatable evidence about backup policy, restorable timestamps, source account, target account, region, and recovery parameters.

CLI use cases

  • Check whether an Azure Cosmos DB account uses continuous or periodic backup.
  • Migrate an eligible account to continuous backup mode under change control.
  • Start or verify a point-in-time restore during an accidental delete or bad-write incident.

Before you run CLI

  • Confirm the exact UTC timestamp before the destructive or corrupting change occurred.
  • Verify restore permissions and target resource group approval before creating a restored account.
  • Check limitations for API type, shared throughput, analytical store, networking, and settings that are not restored.

What output tells you

  • Account output shows backup policy type, continuous tier, locations, and provisioning state.
  • Restorable account or restore output identifies source account, target account, timestamp, and location.
  • Post-restore output confirms whether the recovered account exists but does not prove application-level data correctness.

Mapped Azure CLI commands

Cosmos DB continuous backup operations

direct
az cosmosdb show --name <account> --resource-group <resource-group>
az cosmosdbdiscoverDatabases
az cosmosdb update --name <account> --resource-group <resource-group> --backup-policy-type Continuous --continuous-tier Continuous7Days
az cosmosdbconfigureDatabases
az cosmosdb restore --account-name <source-account> --target-database-account-name <target-account> --resource-group <resource-group> --location <region> --restore-timestamp <utc-timestamp>
az cosmosdbprotectDatabases

Architecture context

Continuous backup is a recovery architecture decision for Azure Cosmos DB, not a checkbox added after launch. I look at it when the workload has accidental delete risk, high write volume, migration activity, or business processes that cannot tolerate losing recent data. It belongs with account API choice, region strategy, retention tier, restore testing, role permissions, and incident runbooks. Teams should know which account, database, and container states can be restored, which timestamps are available, and whether the restore creates a new account or supports the required same-account scenario. Operators need evidence before risky changes: backup mode, restorable resources, latest restorable timestamp, and target region. Continuous backup gives recovery precision, but only if restore procedures are rehearsed.

Security

Security for continuous backup is about controlling who can restore, where restored data lands, and what sensitive data becomes available during recovery. Restore permissions should be isolated from routine database administration because a restored account can expose historical customer or regulated records. Use Azure RBAC, privileged access workflows, private networking, and approved target resource groups. Restored accounts need the same identity, firewall, private endpoint, diagnostic, and key-management scrutiny as production accounts. Remember that some settings are not restored automatically, so security baselines must be reapplied before data is used. Review exceptions regularly, document approved data flows, and make sure support staff understand what they may safely inspect.

Cost

Cost for continuous backup includes the selected backup tier, retained backup storage, restored account resources, validation environments, and operational labor during drills or incidents. The setting may be cheaper than prolonged outage time, but it should not be enabled blindly without understanding retention requirements. Restoring into a new account can create additional throughput, storage, networking, and monitoring charges until cleanup is complete. Track which accounts require continuous backup because of recovery objectives, regulatory expectations, or high change risk. Delete temporary restore accounts after validation to avoid silent spend. Compare the bill with actual business value, operational effort, and risk reduction instead of judging only the unit price.

Reliability

Reliability for continuous backup depends on knowing the restore path before the emergency. Teams should test restores, document how to find the latest restorable timestamp, and confirm whether recovery should happen into a new account or the same account. Multi-region accounts require attention to backup region and consistency behavior, especially when the latest write state matters. Runbooks should cover accidental delete, bad deployment, regional issue, and partial data corruption. After restoration, validate item counts, partition keys, application permissions, and downstream integrations before declaring the service recovered. Practice the failure path, record recovery evidence, and keep human escalation available for cases automation cannot safely resolve.

Performance

Performance for continuous backup is designed so background backups do not consume provisioned request units or reduce database availability, but restore operations still need planning. Recovery can be slower when restoring from certain regions or when large datasets must be validated before use. Application performance may also change if restored accounts have different throughput, regions, indexing policies, or network paths. Measure restore duration, validation time, and cutover time separately. For critical workloads, rehearse recovery with realistic data sizes so the documented recovery time objective is based on evidence, not optimism. Measure end-to-end behavior under realistic volume, because clean lab tests often miss the bottlenecks that users actually feel.

Operations

Operationally, continuous backup needs a recovery drill, not just a portal setting. Operators should know the backup policy type, retention tier, restorable resources, target naming convention, restore permissions, and approval chain. Incident responders need a way to determine the bad-change timestamp from logs, change tickets, or event feeds. Dashboards should surface account configuration, backup mode, regions, and restore activity. After a restore, update connection strings or application settings carefully, reapply network and identity controls, and capture evidence showing why the selected timestamp was chosen. Keep rollback steps, dashboards, service owners, and escalation contacts current so support teams can act without guessing under pressure.

Common mistakes

  • Assuming firewall rules, private endpoints, RBAC assignments, and all account settings are restored automatically.
  • Choosing a local timestamp instead of the required UTC restore timestamp.
  • Leaving temporary restored accounts running after validation and creating avoidable cost.