Databases Azure Cosmos DB premium

Cosmos DB backup policy

Cosmos DB backup policy is the account-level decision that controls how Azure Cosmos DB keeps recoverable copies of your data. Periodic backup gives scheduled backups with configurable interval and retention, while continuous backup supports point-in-time restore for supported scenarios and retention tiers. In plain English, it defines what safety net you have when someone deletes data, a deployment writes bad records, or a container must be recovered. The policy is not a complete disaster plan by itself; teams still need restore permissions, testing, and runbooks.

Back to glossary browser Open Microsoft Learn source

Aliases: Azure Cosmos DB backup policy, Cosmos DB periodic backup, Cosmos DB continuous backup policy
Difficulty: intermediate
CLI mappings: 4
Last verified

Browse trail Learn Databases Azure Cosmos DB Cosmos DB backup policy

Learning map Graph Databases concept cluster Cosmos DB backup policy

Context Concept cluster: Databases concept cluster

Microsoft Learn

Azure Cosmos DB backup policy defines how an account is backed up and restored, including periodic backup settings or continuous backup with point-in-time restore.

Microsoft Learn: Azure Cosmos DB backup and restore

Technical context

Technically, backup policy is configured on the Cosmos DB account and affects recovery options for databases, containers, and data within supported APIs. Periodic mode stores scheduled backups, while continuous mode maintains restorable timestamps for point-in-time restore. Some migrations from periodic to continuous are one-way, so architecture review matters before changing production accounts. Operators must understand retention tier, region availability, account type, restored-account behavior, deleted-resource recovery, and which settings are not restored automatically. Backup policy also intersects with compliance, cost, incident response, and change management.

Why it matters

Cosmos DB backup policy matters because the worst database incidents are often human mistakes rather than hardware failures. A developer can delete a container, a migration can overwrite documents, or a bad integration can corrupt records across many partitions. The chosen backup policy determines how much recovery flexibility the team has when that happens. Periodic backups may be enough for lower-risk workloads, while continuous backup can shorten recovery analysis when the exact bad-write time is known. The business value comes from matching policy, retention, permissions, and restore testing to real recovery objectives instead of assuming Azure backup exists somewhere in the background.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure portal, it appears on Cosmos DB account backup settings, restore screens, retention selections, point-in-time restore workflows, and deleted-resource recovery experiences during production reviews and runbooks.

Signal 02

In CLI and IaC, it appears as account backup policy properties, periodic or continuous mode settings, restore commands, retention choices, and migration review notes in release reviews and support tickets.

Signal 03

In incident response, it appears beside affected containers, known-bad timestamps, restore approvals, recovered-account access, data comparison scripts, and post-incident evidence packages during incident triage and architecture reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

Protect production containers from accidental delete or corrupt writes.
Meet recovery objectives for regulated application data.
Run point-in-time restore investigations in a separate account.
Document account-level backup posture during architecture reviews.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Payroll data recovery

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

BrightTrail Logistics discovered that a payroll import job overwrote employee tax documents in a Cosmos DB container.

Business/Technical Objectives

Recover affected documents from before the bad import
Limit payroll outage to less than four hours
Preserve audit evidence for finance and legal teams
Prevent restored data from being exposed publicly

Solution Using Cosmos DB backup policy

The platform team had configured continuous backup for the payroll Cosmos DB account and documented restore permissions before year-end processing. During the incident, engineers identified the bad import timestamp from application logs, used CLI discovery to confirm restorable resources, and restored the account to a separate private endpoint-protected environment. Payroll analysts compared restored documents with the live container and copied only affected records back through an approved repair script. Security reviewed restored-account access, and the team retained evidence showing who requested, approved, and validated the recovery. The final review documented service owners, rollback triggers, monitoring thresholds, and escalation contacts so production support could act without guessing.

Results & Business Impact

Affected documents were recovered within two hours and 18 minutes
Payroll processing resumed before the four-hour outage target
Audit evidence included timestamps, approvals, and validation queries
The restored account remained private and was deleted after sign-off

Key Takeaway for Glossary Readers

A Cosmos DB backup policy is valuable only when the restore path, access controls, and data validation process are ready before the incident.

Case study 02

Retail catalog rollback

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Fabrikam Home pushed a supplier feed that changed prices and availability for thousands of catalog items overnight.

Business/Technical Objectives

Identify the safe restore point within 30 minutes
Recover catalog items without rolling back unrelated orders
Keep the storefront online during data repair
Quantify the cost of continuous backup for executives

Solution Using Cosmos DB backup policy

The catalog account used continuous backup because pricing incidents had caused previous revenue leakage. Engineers used deployment logs and supplier-feed timestamps to choose a restore point, then restored the affected database into a temporary validation account. Product managers compared restored prices and availability against the current catalog and approved a targeted repair list. The storefront stayed online with a banner while the repair script updated only corrupted items. Cost reporting showed the monthly backup premium and temporary restore cost next to avoided order credits and manual correction effort. A follow-up runbook captured validation queries, owner approvals, cost guardrails, and support handoff steps for the next release. Operations also added a quarterly review to confirm metrics, access, backup assumptions, and application behavior still matched the original design.

Results & Business Impact

Safe timestamp selection completed in 22 minutes
Catalog repair finished without taking checkout offline
Order-credit exposure was reduced by an estimated 83 percent
Executives approved the backup policy after seeing incident cost avoidance

Key Takeaway for Glossary Readers

Backup policy decisions should be tied to real business recovery scenarios, not treated as invisible platform defaults.

Case study 03

Clinical audit retention review

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

Ridgeway Oncology needed to prove that patient workflow data could be restored for audits without giving analysts direct production access.

Business/Technical Objectives

Validate restore readiness for regulated containers
Keep audit recovery evidence reproducible
Separate production access from audit investigation access
Document backup policy choice for compliance review

Solution Using Cosmos DB backup policy

Architects reviewed the Cosmos DB backup policy for patient workflow accounts and moved high-criticality containers to continuous backup after a risk assessment. A quarterly exercise restored a nonproduction copy from an approved timestamp into a locked-down account. Analysts accessed only masked validation views, while database administrators retained control of restore operations. The runbook captured account settings, retention tier, restore command output, network configuration, and data comparison results. Compliance owners received a report showing how periodic backup remained acceptable for low-risk reference data while continuous backup covered patient workflow records. Operations also added a quarterly review to confirm metrics, access, backup assumptions, and application behavior still matched the original design.

Results & Business Impact

Quarterly restore evidence became repeatable and audit-ready
Analysts no longer needed broad production database access
Compliance review time dropped from five days to two days
Backup costs were aligned to container criticality and retention needs

Key Takeaway for Glossary Readers

Cosmos DB backup policy helps regulated teams match recovery capability to data criticality while keeping restored data governed.

Why use Azure CLI for this?

Use CLI to inspect backup settings, confirm restorable resources, and document restore evidence before making recovery decisions under incident pressure.

CLI use cases

Show the account backup policy and related configuration.
Find restorable accounts, databases, or containers for point-in-time recovery.
Create restore evidence for incident, audit, or change-review records.

Before you run CLI

Confirm the account, subscription, affected database, container, and incident timestamp.
Use read-only discovery until the restore target and approval are clear.
Check whether the account uses periodic or continuous backup before choosing commands.

What output tells you

Account output shows backup mode, regions, consistency, and related account settings.
Restorable-resource output helps identify possible recovery timestamps and targets.
Restore command output provides evidence for the recovered resource and operation status.

Mapped Azure CLI commands

Cosmos DB backup policy commands

direct

az cosmosdb show --name <account> --resource-group <resource-group>

az cosmosdbdiscoverDatabases

az cosmosdb restorable-database-account list --location <region>

az cosmosdb restorable-database-accountdiscoverDatabases

az cosmosdb sql restorable-database list --instance-id <restorable-account-id> --location <region>

az cosmosdb sql restorable-databasediscoverDatabases

az cosmosdb restore --target-database-account-name <new-account> --account-name <source-account> --resource-group <resource-group> --restore-timestamp <timestamp>

az cosmosdbprotectDatabases

Architecture context

A Cosmos DB backup policy is the recovery contract for the account, and I want it decided before production data arrives. Periodic backup and continuous backup support different recovery behaviors, retention expectations, and operational runbooks. The architecture should align backup mode with change risk, delete protection, compliance retention, migration windows, and the organization’s recovery objectives. Teams must understand what can be restored, to where, by whom, and how long the restore takes in a realistic incident. Operators should capture account backup mode, retention tier, restorable resources, timestamps, and role assignments before destructive changes. A backup policy is not valuable because it exists; it is valuable when the team can execute a tested restore under pressure.

Security

Security for Cosmos DB backup policy includes who can view backup settings, initiate restores, access restored accounts, and read recovered data. Restore operations can expose historical sensitive records, so they need the same privacy, retention, and approval controls as production data. Use RBAC separation, private networking for restored resources, diagnostic logs, key rotation, and documented approval paths. Teams should prevent casual administrators from changing backup mode without review, especially when the migration is irreversible. Security reviews should verify that restored accounts do not open public access, that secrets are handled safely, and that evidence exists for who requested and used recovered data.

Cost

Cost for Cosmos DB backup policy comes from backup retention, storage, restored accounts, replicated regions, monitoring, and recovery exercises. Continuous backup can provide stronger recovery options, but longer retention tiers and restored environments may add expense. Periodic backup may be cheaper or simpler for low-criticality workloads, but the business cost of longer recovery or more data loss may be higher than the Azure bill. Cost reviews should compare policy choice against workload criticality, compliance requirements, recovery objectives, and incident probability. Teams should also budget for restore drills, temporary validation accounts, and engineering time required to reconcile restored data. Tie every cost decision to measured workload behavior and named budget ownership.

Reliability

Reliability for Cosmos DB backup policy is measured by whether the organization can actually recover data within the required time and data-loss tolerance. The policy must align with RPO, RTO, regulatory retention, account regions, API support, and operational permissions. Continuous backup improves point-in-time flexibility, but it still requires identifying a safe restore timestamp and validating the restored data. Periodic backup needs clear understanding of interval and retention. Reliable teams rehearse deleted-container recovery, bad-write recovery, and restore-to-new-account workflows. They also document what configuration, networking, or application settings must be recreated after restore. Include realistic failure drills, owner escalation, and recovery evidence in the runbook.

Performance

Performance impact from Cosmos DB backup policy is usually less visible than query or throughput tuning, but recovery performance matters during incidents. Teams need to know how quickly they can identify a restore point, create or access the restored resource, validate data, and copy records back if required. Restore workflows can be slowed by missing permissions, unclear timestamps, network restrictions, or untested application reconnection steps. Performance planning should include recovery drills with realistic data volume, not just portal screenshots. Measure time to decision, time to restored account, validation duration, and any downstream processing needed after recovery. Validate tuning with representative traffic, documented baselines, and user-facing service targets.

Operations

Operationally, Cosmos DB backup policy belongs in every account review, migration checklist, and incident runbook. Teams should record backup mode, retention, restore contacts, allowed restore regions, test schedule, and the exact commands or portal steps used during recovery. Runbooks should distinguish between recovering an accidentally deleted resource and investigating bad writes in a restored copy. During incidents, operators need timestamps, affected containers, responsible application owners, and approval for restored-account access. After recovery, teams should capture evidence, compare data, rotate credentials if needed, and update lessons learned so backup policy remains operational, not theoretical. Keep ownership, dashboard links, deployment history, and support escalation steps current.

Common mistakes

Assuming backup policy is the same as a tested restore plan.
Changing backup mode without understanding irreversible migration implications.
Restoring data without securing the restored account and access path.

Operator quick checks

What backup mode and retention does this account use?
When was the last successful restore drill for this workload?
Who is allowed to approve and access recovered data?

Questions to ask

Is the incident an accidental delete, bad write, or broader account failure?
What exact restore timestamp is considered safe?
Will recovery require copying data back or redirecting applications?

Related terms

No related terms mapped yet.

Graph connections

Graph edges are queued for this term.

Learn next

Use related terms, graph links, command groups, and comparison cards to keep moving through Azure without losing context.

Open relationship graph