Free Database Audit

Learn More
  • Sentinel quorum problems - 2-node Sentinel with only 1 healthy can't promote anything; you discovered min-replicas-to-writeisn't tuned correctly only when production failed over.
  • Replica promoted with stale data - a lagging replica fell behind and was elected primary on failover because min-replicas-max-lag wasn't enforced, silently dropping the most recent writes.
  • Replica failover taking 30s+ - Sentinel takes 10s to detect, 5s to vote, 15s to reconfigure replicas; app sees 30+ seconds of timeouts during what should be a graceful failover.

JusDB HA consultants own the failover playbook + 15-minute incident SLA. Book an HA architecture review →

Single-primary + Sentinel - not multi-shard

Valkey High Availability

In short: Valkey high availability (single-primary + replicas + Sentinel) involves quorum-based Sentinel deployment across AZs, sub-second replica lag monitoring, split-brain prevention via min-replicas-to-write and failover-timeout tuning, and automated failover in 15-30 seconds - plus cross-region async replicas and RDB snapshots for disaster recovery beyond local HA.

Production Sentinel quorum design, replica lag monitoring, split-brain prevention, and 15-30 second automated failover SLAs. For horizontal multi-shard scaling, see Valkey Cluster.

Production HA capabilities

Sentinel Quorum Design
3 or 5 sentinel deployment, quorum/majority sizing, sentinel-monitor configuration, anti-affinity across AZs.
Replication Lag Monitoring
Sub-second lag tracking, master_link_status alerting, replication-offset deltas, lag SLO enforcement.
Split-Brain Prevention
min-replicas-to-write + min-replicas-max-lag tuning, partition-tolerance config, failover-timeout to prevent thrashing.
Automated Failover
down-after-milliseconds tuning, parallel-syncs config, failover SLA tracking, client-side reconnect strategy review.
Health & Status Observability
Prometheus exporter setup, Sentinel & primary dashboards, alert routing for replica-out / split-brain / failover events.
Cross-Region DR
Cross-region async replica placement, snapshot offsite, DR-runbook engineering, RPO/RTO target validation drills.

A typical Valkey HA deployment

The shape we deploy by default unless something in the workload pushes us to cluster mode.

Topology
1 primary + 2 replicas (one per AZ) - survives any single-node or AZ outage.
3 Sentinel instances co-located with application servers, spread across the same 3 AZs.
Cross-region async replica for DR (RPO ~30s, manual promotion).
Key parameters
down-after-milliseconds: 10000-15000
min-replicas-to-write: 1 (≥1 connected replica required)
min-replicas-max-lag: 10 seconds
failover-timeout: 180000 (prevents thrashing)

HA FAQ

Build your Valkey HA topology

Send us your current Valkey/Redis HA setup (or none at all). We'll come back with a sized topology proposal, Sentinel quorum recommendation, and DR runbook outline within 48 hours.

Related Valkey Services

Explore more ways our Valkey experts can help with your database infrastructure.