Running Redis at scale means eventually hitting the limits of a single node — memory ceilings, throughput bottlenecks, and the constant anxiety of a single point of failure. Redis Cluster solves all three by distributing data across multiple primary nodes, replicating each shard to one or more replicas, and automatically promoting replicas when a primary fails. The result is a horizontally scalable, fault-tolerant system that your application treats almost like a single Redis instance. This guide walks through every layer of Redis Cluster — from hash slot theory to production failover — so you can deploy and operate it with confidence.
- Redis Cluster shards data across nodes using 16,384 hash slots, each key mapping to exactly one slot.
- Use
redis-cli --cluster createwith--cluster-replicasto bootstrap a cluster in minutes. - Clients must be cluster-aware to handle
MOVEDandASKredirects correctly. - Automatic failover promotes a replica within seconds when a primary loses quorum; manual failover with
CLUSTER FAILOVER TAKEOVERis available for planned maintenance. - Multi-key commands across different slots are illegal — use hash tags
{}to co-locate related keys. - Monitor cluster health continuously with
redis-cli --cluster checkandCLUSTER INFO.
What Is Redis Cluster?
Redis Cluster is Redis's built-in solution for horizontal scaling and high availability. Unlike Redis Sentinel — which handles failover for a single primary/replica setup — Cluster partitions the keyspace itself, spreading data across multiple independent shards. Each shard consists of one primary node and optionally one or more replica nodes. The cluster requires no external coordinator or proxy: nodes communicate with each other over a dedicated gossip bus (default port: data port + 10000) to maintain membership state, detect failures, and coordinate failover elections.
A minimum production cluster requires six nodes: three primaries and three replicas. This gives you three independent failure domains and enough replicas for automatic failover. Fewer nodes are supported for development but should never be used in production — a two-primary cluster cannot achieve quorum for failover.
Redis Cluster does not support multiple databases (SELECT is disabled). All keys live in database 0. If your application relies on database isolation, redesign with key prefixes before migrating to Cluster.
Hash Slots: How Sharding Works
Redis Cluster divides the keyspace into exactly 16,384 hash slots (0–16383). When you write a key, Redis computes a CRC16 of the key (or the portion inside {} if a hash tag is present), then takes that value modulo 16,384 to determine the slot. Each slot is owned by exactly one primary node. The slot-to-node mapping is stored in the cluster configuration and shared via gossip, so every node knows the full topology at all times.
With a six-node cluster (three primaries), the default distribution assigns roughly 5,461 slots per primary:
- Node A: slots 0–5460
- Node B: slots 5461–10922
- Node C: slots 10923–16383
Adding a new primary later triggers a resharding operation that migrates slots (and their keys) between nodes with zero downtime. During migration, a slot enters a transitional state that produces ASK redirects (explained in the client routing section below).
Hash Tags for Co-locating Keys
Multi-key commands like MGET, MSET, and Lua scripts require all keys to reside in the same slot. Hash tags solve this: Redis hashes only the substring inside the first pair of curly braces.
# Both keys hash to the slot for "user:1001" — they land on the same node
SET {user:1001}.profile "..."
SET {user:1001}.settings "..."
MGET {user:1001}.profile {user:1001}.settings # works in ClusterOverusing a single hash tag concentrates too many keys on one slot, creating hot spots that defeat the purpose of sharding. Apply hash tags only to keys that genuinely need to be operated on together.
Setting Up a Redis Cluster
The fastest way to bootstrap a cluster is with redis-cli --cluster create. First, start six Redis instances with clustering enabled.
Example redis-7000.conf (repeat for ports 7001–7005, adjusting port and directories):
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
loglevel notice
logfile /var/log/redis/redis-7000.log
dir /var/lib/redis/7000Start all six processes, then create the cluster:
redis-cli --cluster create \
127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1The --cluster-replicas 1 flag tells redis-cli to assign one replica per primary. With six nodes and one replica each, you get three primaries and three replicas. The CLI prints the proposed assignment and asks for confirmation before writing the cluster configuration.
On real hardware, ensure that each primary and its replica live on different physical hosts (or at minimum different availability zones). Redis Cluster's anti-affinity heuristic tries to avoid placing a primary and its replica on the same IP, but verify the assignment in the CLI output before confirming.
After creation, verify the cluster is healthy:
redis-cli -p 7000 CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_known_nodes:6
# cluster_size:3Replication in Redis Cluster
Each primary replicates its data to its assigned replica(s) using the same asynchronous replication mechanism as standalone Redis. The replica continuously streams commands from the primary's replication log. Because replication is asynchronous, a small window of data loss is possible if a primary fails before the replica has synced the latest writes — this is a fundamental trade-off of Redis's design.
To inspect the current node roles and replication relationships:
redis-cli -p 7000 CLUSTER NODES
# Output columns: node-id, ip:port@bus-port, flags, primary-id, ping-epoch, config-epoch, slots
# Example:
# a1b2c3d4... 127.0.0.1:7000@17000 master - 0 1700000000000 1 connected 0-5460
# e5f6a7b8... 127.0.0.1:7003@17003 slave a1b2c3d4... 0 1700000000000 4 connectedYou can also add replicas to an existing primary at any time:
redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slaveAutomatic Failover
Redis Cluster's failover mechanism is driven by the cluster-node-timeout configuration. When a primary stops responding to pings from the majority of other nodes for longer than cluster-node-timeout milliseconds, those nodes mark it as PFAIL (probable failure). Once a quorum of nodes agree, the node is marked FAIL and a replica election begins.
During the election, the replica with the most up-to-date replication offset wins and is promoted to primary. The entire process — from node failure to a promoted replica accepting writes — typically completes within one to two times the cluster-node-timeout value. With the default of 15000 ms (15 seconds), expect roughly 15–30 seconds of unavailability for the affected slots.
# Lower cluster-node-timeout for faster failover (at the cost of more false positives)
cluster-node-timeout 5000Setting cluster-node-timeout too low (under 2000 ms) in environments with network jitter or high load can trigger unnecessary failovers. Tune carefully with load testing before deploying to production.
Manual Failover with CLUSTER FAILOVER
For planned maintenance — such as upgrading a primary node — initiate a graceful failover from the replica side. The replica will coordinate with the primary to hand off without data loss:
# Connect to the replica you want to promote
redis-cli -p 7003 CLUSTER FAILOVER
# For a primary that is already unreachable, use TAKEOVER
redis-cli -p 7003 CLUSTER FAILOVER TAKEOVERCLUSTER FAILOVER (without options) pauses the primary, waits for the replica to fully sync, then promotes it. CLUSTER FAILOVER TAKEOVER bypasses the primary handshake entirely, making it the only option when the primary is genuinely down but the replica has not been automatically promoted yet (for example, when cluster quorum cannot be achieved).
Client-Side Routing
Unlike a proxy-based setup, Redis Cluster requires clients to participate in routing. A cluster-aware client maintains a local slot map and sends commands directly to the node that owns the target slot. This eliminates proxy latency but requires client library support.
When a client sends a command to the wrong node — either because it has a stale slot map or during resharding — the node responds with a redirect:
- MOVED: The slot has permanently moved to another node. The client must update its slot map and resend the command to the indicated node.
- ASK: The slot is mid-migration. The client should send an
ASKINGcommand to the target node, then resend the original command, but should not update its slot map yet.
# Example MOVED response
-MOVED 3999 127.0.0.1:7002
# Example ASK response
-ASK 3999 127.0.0.1:7002All mature Redis client libraries handle both redirects transparently. Use cluster-aware clients in your application stack:
- Python:
redis-pywithRedisClusterclass - Node.js:
iorediscluster mode - Java:
LettuceorJediscluster client - Go:
go-rediswithNewClusterClient
# Python example
from redis.cluster import RedisCluster
rc = RedisCluster(
startup_nodes=[{"host": "127.0.0.1", "port": 7000}],
decode_responses=True,
skip_full_coverage_check=False,
)
rc.set("user:1001", "value")Enable connection pooling and set reasonable socket_timeout and socket_connect_timeout values in your client. During failover, commands targeting affected slots will receive errors for the duration of the election. Implement retry logic with exponential backoff to ride through these windows gracefully.
Monitoring Cluster Health
Ongoing monitoring is non-negotiable for a production cluster. Use these commands and tools as your foundation.
redis-cli --cluster check
redis-cli --cluster check 127.0.0.1:7000
# Healthy output includes:
# [OK] All nodes agree about slots configuration.
# [OK] All 16384 slots covered.
# M: a1b2c3d4... 127.0.0.1:7000
# slots:[0-5460] (5461 slots) master
# 1 additional replica(s)Run this command from your monitoring pipeline after any topology change (node addition, removal, resharding). The output flags slot coverage gaps, misconfigured replicas, and nodes that disagree about the slot map.
CLUSTER INFO
redis-cli -p 7000 CLUSTER INFO
# Key fields to watch:
# cluster_state: ok | fail
# cluster_slots_ok: 16384 (should always equal total slots)
# cluster_known_nodes: 6
# cluster_stats_messages_sent / cluster_stats_messages_received (gossip health)Alert on cluster_state:fail immediately — it means the cluster has lost coverage for one or more slots and is refusing writes to those slots.
INFO replication (per node)
redis-cli -p 7003 INFO replication
# role:slave
# master_host:127.0.0.1
# master_port:7000
# master_link_status:up
# master_last_io_seconds_ago:0
# master_sync_in_progress:0
# slave_repl_offset:1048576
# master_repl_offset:1048576Track master_last_io_seconds_ago and the replication offset gap. A growing gap indicates replication lag and increases your potential data loss window in the event of a failover.
Resetting a Node
To remove a node from the cluster and return it to standalone mode (useful when decommissioning or re-provisioning):
# First, remove it from the cluster via another node
redis-cli --cluster del-node 127.0.0.1:7000
# Then reset the node itself
redis-cli -p 7005 CLUSTER RESET HARD CLUSTER RESET HARD clears all cluster state, flushes all data, and returns the node to an unclustered state. Use with extreme caution — it is irreversible.
Never run CLUSTER RESET on a node that still holds live slots. Always use redis-cli --cluster del-node to safely drain and remove a node before resetting it.
- Redis Cluster shards data across 16,384 hash slots distributed among primary nodes — no external proxy required.
- Bootstrap with
redis-cli --cluster create ... --cluster-replicas 1; minimum six nodes for production (three primaries, three replicas). - Automatic failover is driven by
cluster-node-timeout; tune it carefully based on your network characteristics and downtime tolerance. - Use
CLUSTER FAILOVERfor graceful planned maintenance andCLUSTER FAILOVER TAKEOVERwhen the primary is already unreachable. - Clients must be cluster-aware to handle
MOVEDandASKredirects; implement retry logic with backoff for failover windows. - Multi-key commands require all keys in the same slot — use hash tags
{}to co-locate, but avoid hot-spot concentrations. - Monitor continuously with
redis-cli --cluster check,CLUSTER INFO, and per-node replication lag; alert immediately oncluster_state:fail.
Deploy and Operate Redis Cluster with JusDB
Redis Cluster shifts the operational complexity from Redis's internals to your team — someone has to manage topology changes, monitor slot coverage, tune timeouts, and respond to failover alerts at 3 AM. JusDB's managed Redis service handles all of that for you: automated cluster provisioning across multiple availability zones, continuous health monitoring, and hands-free failover tested against real failure scenarios. You get the full power of Redis Cluster with a fraction of the operational burden.
Explore JusDB managed Redis to see how your team can stop managing infrastructure and start focusing on your product.