Redis Cluster: Data Sharding, Replication, and High Availability

Running Redis at scale means eventually hitting the limits of a single node — memory ceilings, throughput bottlenecks, and the constant anxiety of a single point of failure. Redis Cluster solves all three by distributing data across multiple primary nodes, replicating each shard to one or more replicas, and automatically promoting replicas when a primary fails. The result is a horizontally scalable, fault-tolerant system that your application treats almost like a single Redis instance. This guide walks through every layer of Redis Cluster — from hash slot theory to production failover — so you can deploy and operate it with confidence.

TL;DR

Redis Cluster shards data across nodes using 16,384 hash slots, each key mapping to exactly one slot.
Use redis-cli --cluster create with --cluster-replicas to bootstrap a cluster in minutes.
Clients must be cluster-aware to handle MOVED and ASK redirects correctly.
Automatic failover promotes a replica within seconds when a primary loses quorum; manual failover with CLUSTER FAILOVER TAKEOVER is available for planned maintenance.
Multi-key commands across different slots are illegal — use hash tags {} to co-locate related keys.
Monitor cluster health continuously with redis-cli --cluster check and CLUSTER INFO.

What Is Redis Cluster?

Redis Cluster is Redis's built-in solution for horizontal scaling and high availability. Unlike Redis Sentinel — which handles failover for a single primary/replica setup — Cluster partitions the keyspace itself, spreading data across multiple independent shards. Each shard consists of one primary node and optionally one or more replica nodes. The cluster requires no external coordinator or proxy: nodes communicate with each other over a dedicated gossip bus (default port: data port + 10000) to maintain membership state, detect failures, and coordinate failover elections.

A minimum production cluster requires six nodes: three primaries and three replicas. This gives you three independent failure domains and enough replicas for automatic failover. Fewer nodes are supported for development but should never be used in production — a two-primary cluster cannot achieve quorum for failover.

Warning

Redis Cluster does not support multiple databases (SELECT is disabled). All keys live in database 0. If your application relies on database isolation, redesign with key prefixes before migrating to Cluster.

Hash Slots: How Sharding Works

Redis Cluster divides the keyspace into exactly 16,384 hash slots (0–16383). When you write a key, Redis computes a CRC16 of the key (or the portion inside {} if a hash tag is present), then takes that value modulo 16,384 to determine the slot. Each slot is owned by exactly one primary node. The slot-to-node mapping is stored in the cluster configuration and shared via gossip, so every node knows the full topology at all times.

With a six-node cluster (three primaries), the default distribution assigns roughly 5,461 slots per primary:

Node A: slots 0–5460
Node B: slots 5461–10922
Node C: slots 10923–16383

Adding a new primary later triggers a resharding operation that migrates slots (and their keys) between nodes with zero downtime. During migration, a slot enters a transitional state that produces ASK redirects (explained in the client routing section below).

Hash Tags for Co-locating Keys

Multi-key commands like MGET, MSET, and Lua scripts require all keys to reside in the same slot. Hash tags solve this: Redis hashes only the substring inside the first pair of curly braces.

text

# Both keys hash to the slot for "user:1001" — they land on the same node
SET {user:1001}.profile "..."
SET {user:1001}.settings "..."

MGET {user:1001}.profile {user:1001}.settings   # works in Cluster

Warning

Overusing a single hash tag concentrates too many keys on one slot, creating hot spots that defeat the purpose of sharding. Apply hash tags only to keys that genuinely need to be operated on together.

Setting Up a Redis Cluster

The fastest way to bootstrap a cluster is with redis-cli --cluster create. First, start six Redis instances with clustering enabled.

Example redis-7000.conf (repeat for ports 7001–7005, adjusting port and directories):

text

port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
loglevel notice
logfile /var/log/redis/redis-7000.log
dir /var/lib/redis/7000

Start all six processes, then create the cluster:

text

redis-cli --cluster create \
  127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1

The --cluster-replicas 1 flag tells redis-cli to assign one replica per primary. With six nodes and one replica each, you get three primaries and three replicas. The CLI prints the proposed assignment and asks for confirmation before writing the cluster configuration.

Tip

On real hardware, ensure that each primary and its replica live on different physical hosts (or at minimum different availability zones). Redis Cluster's anti-affinity heuristic tries to avoid placing a primary and its replica on the same IP, but verify the assignment in the CLI output before confirming.

After creation, verify the cluster is healthy:

text

redis-cli -p 7000 CLUSTER INFO
# cluster_state:ok
# cluster_slots_assigned:16384
# cluster_known_nodes:6
# cluster_size:3

Replication in Redis Cluster

Each primary replicates its data to its assigned replica(s) using the same asynchronous replication mechanism as standalone Redis. The replica continuously streams commands from the primary's replication log. Because replication is asynchronous, a small window of data loss is possible if a primary fails before the replica has synced the latest writes — this is a fundamental trade-off of Redis's design.

To inspect the current node roles and replication relationships:

text

redis-cli -p 7000 CLUSTER NODES
# Output columns: node-id, ip:port@bus-port, flags, primary-id, ping-epoch, config-epoch, slots
# Example:
# a1b2c3d4... 127.0.0.1:7000@17000 master - 0 1700000000000 1 connected 0-5460
# e5f6a7b8... 127.0.0.1:7003@17003 slave a1b2c3d4... 0 1700000000000 4 connected

You can also add replicas to an existing primary at any time:

text

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave

Automatic Failover

Redis Cluster's failover mechanism is driven by the cluster-node-timeout configuration. When a primary stops responding to pings from the majority of other nodes for longer than cluster-node-timeout milliseconds, those nodes mark it as PFAIL (probable failure). Once a quorum of nodes agree, the node is marked FAIL and a replica election begins.

During the election, the replica with the most up-to-date replication offset wins and is promoted to primary. The entire process — from node failure to a promoted replica accepting writes — typically completes within one to two times the cluster-node-timeout value. With the default of 15000 ms (15 seconds), expect roughly 15–30 seconds of unavailability for the affected slots.

text

# Lower cluster-node-timeout for faster failover (at the cost of more false positives)
cluster-node-timeout 5000

Warning

Setting cluster-node-timeout too low (under 2000 ms) in environments with network jitter or high load can trigger unnecessary failovers. Tune carefully with load testing before deploying to production.

Manual Failover with CLUSTER FAILOVER

For planned maintenance — such as upgrading a primary node — initiate a graceful failover from the replica side. The replica will coordinate with the primary to hand off without data loss:

text

# Connect to the replica you want to promote
redis-cli -p 7003 CLUSTER FAILOVER

# For a primary that is already unreachable, use TAKEOVER
redis-cli -p 7003 CLUSTER FAILOVER TAKEOVER

CLUSTER FAILOVER (without options) pauses the primary, waits for the replica to fully sync, then promotes it. CLUSTER FAILOVER TAKEOVER bypasses the primary handshake entirely, making it the only option when the primary is genuinely down but the replica has not been automatically promoted yet (for example, when cluster quorum cannot be achieved).

Client-Side Routing

Unlike a proxy-based setup, Redis Cluster requires clients to participate in routing. A cluster-aware client maintains a local slot map and sends commands directly to the node that owns the target slot. This eliminates proxy latency but requires client library support.

When a client sends a command to the wrong node — either because it has a stale slot map or during resharding — the node responds with a redirect:

MOVED: The slot has permanently moved to another node. The client must update its slot map and resend the command to the indicated node.
ASK: The slot is mid-migration. The client should send an ASKING command to the target node, then resend the original command, but should not update its slot map yet.

text

# Example MOVED response
-MOVED 3999 127.0.0.1:7002

# Example ASK response
-ASK 3999 127.0.0.1:7002

All mature Redis client libraries handle both redirects transparently. Use cluster-aware clients in your application stack:

Python: redis-py with RedisCluster class
Node.js: ioredis cluster mode
Java: Lettuce or Jedis cluster client
Go: go-redis with NewClusterClient

text

# Python example
from redis.cluster import RedisCluster

rc = RedisCluster(
    startup_nodes=[{"host": "127.0.0.1", "port": 7000}],
    decode_responses=True,
    skip_full_coverage_check=False,
)
rc.set("user:1001", "value")

Tip

Enable connection pooling and set reasonable socket_timeout and socket_connect_timeout values in your client. During failover, commands targeting affected slots will receive errors for the duration of the election. Implement retry logic with exponential backoff to ride through these windows gracefully.

Monitoring Cluster Health

Ongoing monitoring is non-negotiable for a production cluster. Use these commands and tools as your foundation.

redis-cli --cluster check

text

redis-cli --cluster check 127.0.0.1:7000

# Healthy output includes:
# [OK] All nodes agree about slots configuration.
# [OK] All 16384 slots covered.
# M: a1b2c3d4... 127.0.0.1:7000
#    slots:[0-5460] (5461 slots) master
#    1 additional replica(s)

Run this command from your monitoring pipeline after any topology change (node addition, removal, resharding). The output flags slot coverage gaps, misconfigured replicas, and nodes that disagree about the slot map.

CLUSTER INFO

text

redis-cli -p 7000 CLUSTER INFO

# Key fields to watch:
# cluster_state: ok | fail
# cluster_slots_ok: 16384   (should always equal total slots)
# cluster_known_nodes: 6
# cluster_stats_messages_sent / cluster_stats_messages_received  (gossip health)

Alert on cluster_state:fail immediately — it means the cluster has lost coverage for one or more slots and is refusing writes to those slots.

INFO replication (per node)

text

redis-cli -p 7003 INFO replication

# role:slave
# master_host:127.0.0.1
# master_port:7000
# master_link_status:up
# master_last_io_seconds_ago:0
# master_sync_in_progress:0
# slave_repl_offset:1048576
# master_repl_offset:1048576

Track master_last_io_seconds_ago and the replication offset gap. A growing gap indicates replication lag and increases your potential data loss window in the event of a failover.

Resetting a Node

To remove a node from the cluster and return it to standalone mode (useful when decommissioning or re-provisioning):

text

# First, remove it from the cluster via another node
redis-cli --cluster del-node 127.0.0.1:7000 

# Then reset the node itself
redis-cli -p 7005 CLUSTER RESET HARD

CLUSTER RESET HARD clears all cluster state, flushes all data, and returns the node to an unclustered state. Use with extreme caution — it is irreversible.

Warning

Never run CLUSTER RESET on a node that still holds live slots. Always use redis-cli --cluster del-node to safely drain and remove a node before resetting it.

Key Takeaways

Redis Cluster shards data across 16,384 hash slots distributed among primary nodes — no external proxy required.
Bootstrap with redis-cli --cluster create ... --cluster-replicas 1; minimum six nodes for production (three primaries, three replicas).
Automatic failover is driven by cluster-node-timeout; tune it carefully based on your network characteristics and downtime tolerance.
Use CLUSTER FAILOVER for graceful planned maintenance and CLUSTER FAILOVER TAKEOVER when the primary is already unreachable.
Clients must be cluster-aware to handle MOVED and ASK redirects; implement retry logic with backoff for failover windows.
Multi-key commands require all keys in the same slot — use hash tags {} to co-locate, but avoid hot-spot concentrations.
Monitor continuously with redis-cli --cluster check, CLUSTER INFO, and per-node replication lag; alert immediately on cluster_state:fail.

Deploy and Operate Redis Cluster with JusDB

Redis Cluster shifts the operational complexity from Redis's internals to your team — someone has to manage topology changes, monitor slot coverage, tune timeouts, and respond to failover alerts at 3 AM. JusDB's managed Redis service handles all of that for you: automated cluster provisioning across multiple availability zones, continuous health monitoring, and hands-free failover tested against real failure scenarios. You get the full power of Redis Cluster with a fraction of the operational burden.

Explore JusDB managed Redis to see how your team can stop managing infrastructure and start focusing on your product.

Redis Cluster: Data Sharding, Replication, and High Availability

What Is Redis Cluster?

Hash Slots: How Sharding Works

Hash Tags for Co-locating Keys

Setting Up a Redis Cluster

Replication in Redis Cluster

Automatic Failover

Manual Failover with CLUSTER FAILOVER

Client-Side Routing

Monitoring Cluster Health

redis-cli --cluster check

CLUSTER INFO

INFO replication (per node)

Resetting a Node

Deploy and Operate Redis Cluster with JusDB

Share this article

Need Expert Help?

MySQL High Availability

PostgreSQL High Availability

MSSQL High Availability

Related Articles

High Performance with MongoDB: A Top-Down Tuning Guide

MongoDB Explained (2026): Replica Sets, Sharding, Atlas & Production Patterns

ScyllaDB vs Apache Cassandra: Performance and Operational Differences