Cluster mode — multi-shard topology
Valkey Cluster Services
Shard sizing, slot rebalancing, replica placement, and client-side cluster awareness for Valkey 8 in Cluster mode. For single-primary HA with Sentinel see high availability; this page is specifically about 16,384-slot horizontal sharding.
What we operate in Valkey Cluster mode
Shard Sizing & Topology
Working-set-to-shard-count modeling, replica placement across AZs/racks, replica count per master.
Slot Rebalancing
Online resharding planning, slot migration throttling, validating slot ownership convergence, post-reshard verification.
Client-Side Cluster Awareness
Client SDK audit for MOVED/ASK handling, slot-cache refresh tuning, pipeline-and-cluster compatibility.
Multi-Key Operations
Hash-tag design for atomic multi-key commands, MGET fan-out across shards, Lua-script-in-cluster constraints.
Failure Mode Engineering
Quorum sizing (cluster-node-timeout, cluster-require-full-coverage), partial-cluster behavior, split-brain prevention.
Cross-Region Federation
Cross-region replicas, region-aware client routing, federation patterns for active-active workloads (when CRDTs aren't required).
Cluster sizing — the numbers behind the recommendations
Three real cluster shapes we've operated, with the trade-offs each one optimizes for.
6 shards × 2 replicas
Typical workload
Catalog cache, ~120 GB working set, 30k QPS reads
What it optimizes for
Simplest operationally; survives a single-node + single-replica outage per shard.
Watch out for
All shards in one AZ = AZ-failure risk; spread replicas.
12 shards × 1 replica
Typical workload
Session store, ~400 GB working set, sustained 80k writes/sec
What it optimizes for
Smaller blast radius per shard failure; cheaper per-GB than wider replication.
Watch out for
Replica-count=1 means a master+replica double-failure loses a slot range.
24 shards × 3 replicas, multi-AZ
Typical workload
Real-time pricing, ~1 TB working set, low-double-digit ms RPO
What it optimizes for
Survives full-AZ outage; highest read fan-out capacity.
Watch out for
Higher operational surface; reshard windows take longer.