Databases Azure Managed Redis verified

Redis passive geo-replication

Redis passive geo-replication is a disaster recovery pattern for Azure Cache for Redis where one Premium cache is primary and another cache in a different region receives replicated data. Applications normally write to the primary. The secondary is read-only until operators unlink or promote it during a regional incident. Passive geo-replication is not the same as automatic multi-region active writing. It gives teams a prepared regional copy, but they still need runbooks, DNS or configuration changes, and recovery testing.

Aliases
passive Redis geo-replication, Azure Cache Redis geo replication, Redis server link, read-only Redis secondary
Difficulty
advanced
CLI mappings
5
Last verified
2026-05-21

Microsoft Learn

Redis passive geo-replication links two Premium Azure Cache for Redis instances, usually across Azure regions. The primary accepts reads and writes, replicated data flows to a read-only secondary, and failover is an operator-driven disaster recovery action rather than automatic traffic switching.

Microsoft Learn: Configure passive geo-replication for Premium Azure Cache for Redis instances2026-05-21

Technical context

In Azure architecture, passive geo-replication spans the cache data plane and Azure control plane. The control plane creates two compatible Premium cache instances and links them with server-link operations. The data plane asynchronously replicates Redis data from primary to secondary. The pattern interacts with region strategy, virtual networking, private endpoints, application configuration, persistence, metrics, and recovery objectives. Azure exposes link state, replication health, and replication lag metrics, while operators decide when to break the link and redirect clients during disaster recovery.

Why it matters

Passive geo-replication matters because many applications discover too late that their cache is part of recovery, not just performance. If a primary region fails and the cache holds sessions, rate-limit state, personalization, or warm search results, rebuilding from empty Redis may slow recovery. A passive replica can reduce data-loss exposure, but it does not remove architectural responsibility. Teams must know which clients can tolerate read-only secondary behavior, how stale the replica may be, who can initiate failover, and how traffic will move. The value is preparation: a tested secondary cache is much better than improvising cache creation during a regional outage.

Where you see it

Signals, screens, and Azure surfaces where this term usually becomes operational.

Signal 01

In the Azure Cache for Redis replication or geo-replication blade, linked Premium caches clearly show primary and secondary roles with link status and regional information.

Signal 02

In Azure CLI server-link output, linkedServerName, replicationRole, provisioningState, and resource IDs identify whether the current cache participates in passive geo-replication during disaster recovery rehearsals regularly.

Signal 03

In Azure Monitor, geo-replication health and data sync offset metrics show whether the secondary is current enough for disaster recovery confidence during recovery readiness reviews.

When this becomes relevant

Specific situations where this term helps solve real Azure design, operations, migration, security, reliability, cost, or governance problems.

  • Keep a warm secondary cache for cross-region disaster recovery when the primary Azure Cache for Redis region becomes unavailable.
  • Rehearse manual failover by unlinking the secondary, redirecting applications, and validating private networking and DNS paths.
  • Protect registration, filing, or booking events from cold-cache database overload after regional recovery.
  • Meet continuity evidence requirements by exporting server-link state, replication role, and sync-health metrics during DR exercises.
  • Plan migration away from passive Premium links when workloads need active-active writes, modules, or Azure Managed Redis architecture.

Real-world case studies

Different enterprise-style examples that show the term being used to hit measurable objectives.

Case study 01

Tax filing SaaS prepares cache recovery before filing deadline

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A tax filing SaaS provider used Redis for authenticated sessions, form hints, and recent validation results. Regional downtime during filing week would have created a costly support surge.

Business/Technical Objectives
  • Keep a ready secondary cache in another Azure region.
  • Validate an operator-controlled failover path before deadline week.
  • Limit cache-related recovery time to under 45 minutes.
  • Document acceptable replication lag for cached form hints.
Solution Using Redis passive geo-replication

The engineering team linked two compatible Premium Azure Cache for Redis instances with passive geo-replication. The primary region handled normal reads and writes, while the secondary remained read-only and monitored. Azure CLI commands listed server links, captured roles, and exported resource IDs into the DR runbook. Private endpoints, firewall rules, and application configuration slots were prepared in both regions. During monthly exercises, operators checked geo-replication health and sync offset, broke the link in a test environment, redirected the application setting, and validated user sessions. Durable tax records stayed in the primary database platform, so Redis only accelerated recovery and user experience.

Results & Business Impact
  • The final DR exercise restored cache-backed user flows in 31 minutes.
  • Support-risk modeling showed a 52 percent reduction in expected filing-week incident volume.
  • Replication lag stayed under the agreed five-minute cache RPO during load tests.
  • Auditors accepted the documented evidence for two-region cache recovery readiness.
Key Takeaway for Glossary Readers

Passive geo-replication is useful when cache recovery is rehearsed as part of business continuity, not treated as automatic magic.

Case study 02

University enrollment portal avoids cold-cache collapse during registration

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A university enrollment portal cached course availability, waitlist hints, and student session data. Registration day created extreme traffic that the database could not absorb after a cold cache rebuild.

Business/Technical Objectives
  • Maintain a warm Redis copy outside the primary campus region.
  • Avoid database overload if the primary cache region failed.
  • Rehearse client redirection with identity and network controls in place.
  • Keep authoritative enrollment decisions outside Redis.
Solution Using Redis passive geo-replication

Architects implemented passive geo-replication between Premium Redis caches in two Azure regions. The secondary cache received replicated course and session data but was not used for normal writes. Azure CLI server-link commands became part of the registration readiness checklist, alongside private endpoint validation, DNS review, and application configuration export. Azure Monitor tracked geo-replication health, data sync offset, used memory, and backend database load. Enrollment transactions remained in the student information system and service bus workflows. If failover was needed, the operations team would unlink the secondary, update application configuration, warm remaining prefixes, and confirm that course availability reconciliation completed.

Results & Business Impact
  • Registration-day recovery testing kept database CPU below 68 percent after simulated regional cache loss.
  • Warm-cache availability reduced estimated recovery time from two hours to 37 minutes.
  • No enrollment transaction relied on Redis as the source of truth.
  • The readiness checklist found a missing secondary-region private DNS link before the real event.
Key Takeaway for Glossary Readers

A passive Redis replica can prevent database shock during recovery when the application still keeps authoritative decisions in durable systems.

Case study 03

Marine logistics platform tests secondary-region cache activation

Scenario, objectives, solution, measured impact, and takeaway.

Scenario

A marine logistics platform served vessel schedules and berth-planning dashboards from Redis-backed APIs. A regional outage would have disrupted port operations across several time zones.

Business/Technical Objectives
  • Provide a tested cache standby for vessel schedule dashboards.
  • Keep failover commands executable by the on-call platform team.
  • Measure replication freshness during heavy schedule update bursts.
  • Avoid exposing secondary Redis access outside approved private networks.
Solution Using Redis passive geo-replication

The platform group created a passive geo-replication link between two Premium Redis caches and tagged them as one disaster recovery pair. Azure CLI exported server-link state, primary and secondary IDs, and geo-replication metrics into weekly readiness reports. Private endpoints and firewall rules were mirrored across regions, and only the on-call platform role could mutate the link. Application settings were prepared to point APIs to the secondary after an incident commander approved failover. Load tests generated bursty vessel schedule updates while Azure Monitor tracked sync offset and cache hit behavior. Post-test runbooks covered failback and the decision about which cache held valid data.

Results & Business Impact
  • Burst testing showed replicated schedule data stayed within the approved ten-minute RPO.
  • On-call engineers completed the secondary activation exercise in 24 minutes.
  • Network review found no public path to the secondary Redis endpoint.
  • Dashboard availability during a simulated regional loss improved from 61 percent to 94 percent.
Key Takeaway for Glossary Readers

Passive geo-replication earns its cost when teams can prove the secondary is fresh, reachable, protected, and operationally promotable.

Why use Azure CLI for this?

Azure CLI is essential for passive geo-replication because disaster recovery work needs repeatable commands, not portal memory. I use CLI to list server links, confirm primary and secondary roles, collect replication health metrics, and document resource IDs before an outage. During a failover exercise, CLI commands can be rehearsed exactly, including unlinking, deleting links, and exporting before-and-after state. It also helps governance because only approved operators should mutate replication links. The CLI history, JSON output, and runbook steps give incident commanders evidence when regional pressure makes manual portal navigation risky. That discipline matters most when regional incidents compress decision time.

CLI use cases

  • List existing server links to verify which Premium caches are paired for passive geo-replication.
  • Create a server link during a controlled DR setup using explicit primary and secondary resource IDs.
  • Show link status and roles before a failover test or regional readiness review.
  • Delete or break a link as part of a documented failover or failback procedure.
  • Collect geo-replication health and sync metrics for DR evidence and executive readiness reports.

Before you run CLI

  • Confirm both caches are Premium, compatible in size and region strategy, and owned by the same recovery runbook.
  • Verify the exact primary and secondary resource IDs; linking the wrong cache can put production recovery data at risk.
  • Use Reader permission for inspection and tightly controlled write permissions for server-link create or delete operations.
  • Check private endpoints, firewall rules, DNS, and application configuration paths in both regions before a failover exercise.
  • Understand that deleting or changing links is disaster-recovery-impacting and may require downtime, data validation, or rollback steps.

What output tells you

  • replicationRole shows whether the cache is acting as primary or secondary in the passive geo-replication relationship.
  • linkedServerName and resource IDs identify the paired cache, which is critical before any failover or unlink operation.
  • provisioningState indicates whether the link operation is complete, still updating, or failed and needs operator action.
  • Geo-replication health metrics show whether the secondary is a trustworthy recovery target or only a stale standby.
  • Data sync offset or lag metrics indicate how much replicated data may be behind the primary during a recovery decision.

Mapped Azure CLI commands

Redis passive geo-replication CLI Commands

direct
az redis server-link list --name <primary-cache-name> --resource-group <resource-group>
az redis server-linkdiscoverDatabases
az redis server-link show --name <primary-cache-name> --resource-group <resource-group> --linked-server-name <secondary-cache-name>
az redis server-linkdiscoverDatabases
az redis server-link create --name <primary-cache-name> --resource-group <resource-group> --server-to-link <secondary-resource-id> --replication-role Primary
az redis server-linkprovisionDatabases
az redis server-link delete --name <cache-name> --resource-group <resource-group> --linked-server-name <linked-cache-name>
az redis server-linkremoveDatabases
az monitor metrics list --resource <redis-resource-id> --metric "Geo Replication Healthy,Geo Replication Data Sync Offset" --start-time <utc-start> --end-time <utc-end>
az monitor metricsdiscoverDatabases

Architecture context

A solid passive geo-replication design starts with region pairing, application routing, and data semantics. Both caches need compatible tier, size, networking, and operational ownership. Applications should not assume the secondary is writable until the disaster recovery procedure changes the topology. I expect runbooks to cover link creation, health checks, planned failover exercises, DNS or configuration flips, and post-failback cleanup. Private endpoints and firewall rules must be designed in both regions, not added during the incident. Passive geo-replication also belongs in RPO discussions because replication is asynchronous; the last accepted write may not be present when the secondary becomes active. early.

Security

Security scope doubles when passive geo-replication is enabled because two regional cache endpoints, two network perimeters, and two sets of access paths must be protected. Limit who can create, delete, or alter server links because those actions affect disaster recovery posture. Use private endpoints or strict firewall rules in both regions, rotate access keys carefully, and avoid exposing the secondary endpoint to clients that should not read replicated data. Replication may carry sensitive cached values across regions, so data residency, compliance, and encryption requirements must be reviewed. Log control-plane changes and include secondary-cache access in security monitoring. and routinely audited.

Cost

Passive geo-replication has a direct cost because it requires a second Premium cache instance, often in another region, plus possible private endpoints, monitoring, logging, and cross-region operational effort. The secondary may be underused during normal operations, but it exists to buy recovery confidence. Cost decisions should compare the standby cache against expected downtime cost, cache rebuild time, and backend surge risk after regional failure. Over-sizing the secondary wastes money; under-sizing it produces a DR plan that fails under load. FinOps owners should tag both caches as one recovery pair so chargeback reflects the business service protected. during yearly planning reviews.

Reliability

Reliability is the purpose of passive geo-replication, but it is not automatic resilience. The secondary can reduce recovery time and data-loss exposure, yet operators must test failover and application redirection. Replication lag, link health, regional network issues, and cache size mismatch can change actual recovery behavior. If applications store state that cannot be lost, Redis replication alone is not enough; durable systems still need to own authoritative data. Runbooks should define RPO, RTO, validation checks, rollback, and failback. Alerts on link health and replication offset are important because an unhealthy secondary creates false confidence. Practice under load before relying on it.

Performance

Passive geo-replication affects performance indirectly. Under normal conditions, the primary still serves application traffic, while replication adds background work and may be sensitive to data volume and write rate. During failover, application performance depends on how quickly clients move to the secondary, how warm the replicated data is, and whether the secondary has enough capacity. Cross-region latency matters for operations and monitoring, not ordinary primary reads. Test failover under realistic load because a secondary that looks healthy at rest may not handle peak traffic after promotion, especially if connection limits or memory differ. Measure client reconnect behavior during each exercise.

Operations

Operators handle passive geo-replication through inventory, link validation, monitoring, rehearsed failover, and post-incident cleanup. Azure CLI can list server links, show linked cache names, and create or delete links with explicit resource IDs. Azure Monitor should track geo-replication health, replication lag or offset, used memory, and connection behavior in both regions. Runbooks must state who approves unlinking, how application traffic moves, and what evidence proves the secondary is safe to use. After recovery, operators need a plan for rebuilding or relinking the old primary without overwriting valid data. They also preserve command transcripts so post-incident reviews can reconstruct decisions clearly.

Common mistakes

  • Assuming passive geo-replication provides automatic failover and writable multi-region Redis behavior.
  • Creating a secondary cache without matching networking, private endpoint, firewall, and client configuration readiness.
  • Ignoring replication lag and then promising an RPO that the cache relationship cannot actually meet.
  • Letting untested failover commands live only in a document instead of rehearsing them with Azure CLI.
  • Relinking or rebuilding after failover without validating which cache contains the most current valid data.