High Availability

MySQL InnoDB Cluster Explained: Complete Guide to High Availability

Deploy enterprise-grade MySQL high availability with InnoDB Cluster. Learn Group Replication integration, MySQL Router configuration, and automatic failover setup.

JusDB Team
March 22, 2022
4 min read
7806 views

A healthcare SaaS client called us at 2am: their primary MySQL node had crashed and the application was down. They'd been running a simple primary-replica setup with manual failover. The replica was healthy, data was intact — but promoting it required SSH access, config changes, and a DNS update. By the time the on-call engineer had done it, 47 minutes had passed.

After that incident, we migrated them to MySQL InnoDB Cluster. Three months later, their primary went down again — network partition, not hardware. InnoDB Cluster detected it in 10 seconds, elected a new primary, MySQL Router rerouted traffic automatically. Application downtime: zero seconds. The on-call engineer found out from a Slack alert after the fact.

That's what InnoDB Cluster is for. This guide explains how it works, how to set it up, and the operational details most guides skip.

TL;DR
  • InnoDB Cluster = Group Replication + MySQL Router + MySQL Shell packaged together
  • Automatic failover in <30 seconds (no human required, no DNS changes)
  • Requires minimum 3 nodes for quorum — 2-node setups cannot self-heal after a node failure
  • Single-primary mode (one writer) is the only production-safe choice — multi-primary has significant conflict risks
  • MySQL Router must be deployed on the same host as your application, not on the DB nodes

How InnoDB Cluster Actually Works

Three components work together. Understanding each is essential for operating the cluster correctly:

Group Replication — the consensus layer

Group Replication uses a Paxos-based consensus protocol to ensure all nodes agree on transaction order before committing. Unlike traditional async replication, where a replica might be seconds behind the primary, Group Replication enforces that a transaction is committed on a majority of nodes before the application gets an acknowledgment.

The trade-off: write latency increases by roughly the network round-trip time between nodes. For nodes in the same datacenter (<1ms RTT), this is imperceptible. For nodes across regions (50ms+ RTT), writes become noticeably slower — cross-region InnoDB Cluster is rarely the right architecture.

-- Check replication group members and their status
SELECT
  MEMBER_HOST,
  MEMBER_PORT,
  MEMBER_STATE,
  MEMBER_ROLE,
  MEMBER_VERSION
FROM performance_schema.replication_group_members;

-- Healthy cluster output:
-- MEMBER_STATE = ONLINE for all nodes
-- MEMBER_ROLE = PRIMARY for one, SECONDARY for the rest

MySQL Router — transparent application routing

MySQL Router sits between your application and the DB nodes. Your application connects to Router (port 6446 for read/write, 6447 for read-only). Router maintains an internal map of which node is currently primary and routes writes to it automatically.

Critical mistake we see constantly: teams deploy Router on the database nodes instead of the application servers. This defeats its purpose — if the DB node Router is on goes down, the application loses its connection to Router. Deploy Router on every application server (or load balancer tier), not on the database nodes.

# Bootstrap Router (run on the APPLICATION server, not DB server)
mysqlrouter --bootstrap root@primary-host:3306 --directory /etc/mysqlrouter

# This generates /etc/mysqlrouter/mysqlrouter.conf with:
# [routing:primary]    port 6446  -- read/write, goes to PRIMARY only
# [routing:secondary]  port 6447  -- read-only, load-balanced across SECONDARies

# Application connection strings:
# Writes: mysql://app_user@localhost:6446/mydb
# Reads:  mysql://app_user@localhost:6447/mydb  (optional, for read scaling)

MySQL Shell — management interface

# Connect to MySQL Shell
mysqlsh root@primary-host:3306

# All cluster management is done via the dba object
JS> var cluster = dba.getCluster('prodCluster')
JS> cluster.status()   -- health check
JS> cluster.describe() -- topology

Setting Up InnoDB Cluster: Step by Step

Prerequisites on each node

-- Each node needs these settings in my.cnf before joining the cluster:
[mysqld]
server_id = 1          -- unique per node (1, 2, 3...)
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_checksum = NONE  -- required for Group Replication

Pre-flight check with MySQL Shell

# Run on each node before creating the cluster
mysqlsh root@node1:3306

JS> dba.checkInstanceConfiguration('root@node1:3306')
# Fix all issues it reports before proceeding
# Common issues: missing server_id, wrong gtid_mode, firewall blocking port 33060

Create the cluster from node 1

mysqlsh root@node1:3306

JS> dba.createCluster('prodCluster', {
  ipAllowlist: '10.0.0.0/8',       -- restrict to your VPC CIDR
  exitStateAction: 'READ_ONLY',    -- failed nodes become read-only, not offline
  memberWeight: 50                  -- default election weight
})

-- Verify:
JS> cluster.status()
-- Should show node1 as PRIMARY, cluster healthy

Add the other nodes

JS> cluster.addInstance('root@node2:3306', {
  recoveryMethod: 'clone'   -- use MySQL Clone plugin for initial sync (fastest)
})

JS> cluster.addInstance('root@node3:3306', {
  recoveryMethod: 'clone'
})

-- Final cluster status (all 3 nodes ONLINE):
JS> cluster.status()
{
  "clusterName": "prodCluster",
  "defaultReplicaSet": {
    "status": "OK",
    "topology": {
      "node1:3306": {"memberRole": "PRIMARY", "status": "ONLINE"},
      "node2:3306": {"memberRole": "SECONDARY", "status": "ONLINE"},
      "node3:3306": {"memberRole": "SECONDARY", "status": "ONLINE"}
    }
  }
}

Failover: What Actually Happens

When the primary node fails:

  1. Group Replication detects the failure (default: within ~10 seconds via group_replication_member_expel_timeout)
  2. Remaining nodes hold an election — the node with the highest memberWeight and most up-to-date transactions wins
  3. The winning node becomes primary and starts accepting writes
  4. MySQL Router detects the topology change (it polls the cluster metadata) and starts routing to the new primary
  5. Application connections drop briefly, then reconnect through Router to the new primary
code
-- Tune failover speed:
-- group_replication_member_expel_timeout: how long to wait before expelling a silent member
-- Lower = faster failover but more false positives from network glitches
SET GLOBAL group_replication_member_expel_timeout = 5;  -- default: 5 seconds

-- autoRejoinTries: how many times an expelled member tries to rejoin automatically
JS> cluster.setOption('autoRejoinTries', 3)

-- Simulate a failover (safe in staging):
JS> cluster.setPrimaryInstance('node2:3306')
-- Manually moves primary to node2. Takes ~5 seconds.

Recovering a failed node

code
-- A failed node that comes back online may be in RECOVERING or ERROR state
JS> cluster.status()

-- Rejoin it to the cluster:
JS> cluster.rejoinInstance('root@node3:3306')

-- If it's too far behind to catch up via incremental sync, use clone:
JS> cluster.rejoinInstance('root@node3:3306', {recoveryMethod: 'clone'})

-- If the node is permanently gone, remove it from metadata:
JS> cluster.removeInstance('root@node3:3306', {force: true})
-- Then add a replacement node

Production Configuration

code
# my.cnf for all cluster nodes

[mysqld]
# Core cluster requirements
server_id = 1  # unique per node
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_checksum = NONE

# Group Replication plugin
plugin_load_add = group_replication.so
group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"  # UUID, same on all nodes
group_replication_start_on_boot = OFF  # let InnoDB Cluster manage startup
group_replication_local_address = "node1:33061"  # internal GR port
group_replication_group_seeds = "node1:33061,node2:33061,node3:33061"
group_replication_bootstrap_group = OFF

# InnoDB Cluster metadata
loose-group_replication_ip_allowlist = "10.0.0.0/8"
loose-group_replication_member_expel_timeout = 5

# Performance settings (same as single-node MySQL tuning)
innodb_buffer_pool_size = 12G  # 75% of RAM on a 16G node
innodb_io_capacity = 2000      # SSD
innodb_redo_log_capacity = 2147483648  # 2GB

# Binary log (required for GR)
log_bin = mysql-bin
binlog_format = ROW
binlog_row_image = FULL
sync_binlog = 1

Monitoring a Production Cluster

code
-- Real-time cluster health query (run this from a monitoring script)
SELECT
  MEMBER_HOST,
  MEMBER_PORT,
  MEMBER_ROLE,
  MEMBER_STATE,
  MEMBER_VERSION,
  @@group_replication_member_expel_timeout AS expel_timeout_s
FROM performance_schema.replication_group_members;

-- Replication lag per secondary
SELECT
  CHANNEL_NAME,
  LAST_QUEUED_TRANSACTION,
  LAST_APPLIED_TRANSACTION,
  LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,
  NOW() - LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP AS lag_seconds
FROM performance_schema.replication_applier_status_by_worker
WHERE CHANNEL_NAME LIKE '%group_replication%';

-- Check for long-running transactions blocking replication
SELECT
  trx_id,
  trx_state,
  TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds,
  trx_query
FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 30
ORDER BY running_seconds DESC;

Alerting thresholds

MetricWarningCritical
Cluster member stateAny node RECOVERINGAny node ERROR or UNREACHABLE
Replication lag> 5 seconds> 30 seconds
Long transactions> 30 seconds> 120 seconds
Cluster sizeBelow 3 nodesBelow quorum (2 of 3 for a 3-node cluster)

Real Limitations to Know Before You Deploy

  • Writes slow down under load: Group Replication's consensus protocol means every write round-trips to a majority of nodes. Under very high write throughput (>20,000 writes/sec), you may see increased write latency. Batching writes helps significantly.
  • Multi-primary mode is not safe for most workloads: Multiple primaries allow concurrent writes, but conflict detection and rollback can cause unexpected application errors. Avoid it unless you have a very specific use case and have tested it extensively.
  • No cross-region support without significant latency penalty: Group Replication is synchronous — 50ms cross-region RTT means every write takes 50ms+ more. For multi-region HA, use async replication between InnoDB Clusters (InnoDB ClusterSet).
  • Large transactions can stall replication: A transaction that modifies millions of rows needs to be certified by all nodes. This can cause flow control throttling that impacts the entire cluster's write throughput.
  • Maximum 9 nodes recommended: The consensus protocol overhead scales with node count. 3 nodes is the sweet spot for most workloads; 5 nodes for extra redundancy.

InnoDB Cluster vs Galera vs Orchestrator+Async

Aspect InnoDB Cluster Galera / PXC Orchestrator + Async Replication
FailoverAutomatic (<30s)Automatic (<30s)Automated but configurable (seconds to minutes)
Write consistencyStrong (certified commit)Strong (wsrep)Eventual (async) or semi-sync
Write throughputLower (certification overhead)Lower (certification overhead)Highest (async has no overhead)
ToolingMySQL Shell, Router — nativeExternal LB + scripts requiredOrchestrator + VIP/ProxySQL/HAProxy
Oracle supportYesNoNo
Best forTeams wanting native, low-ops HATeams already familiar with PXC/GaleraVery high write throughput + tolerance for potential data loss

Working with JusDB on InnoDB Cluster

The most common issues we fix in InnoDB Cluster deployments are: MySQL Router deployed on DB nodes (wrong), no monitoring for replication lag (dangerous), and multi-primary mode used without understanding its conflict model (causes mysterious application errors).

We deploy and operate InnoDB Clusters as part of our MySQL consulting and managed MySQL SRE services. Typical engagements include migrating from standalone primary-replica setups, setting up InnoDB ClusterSet for multi-region deployments, and building proper monitoring for cluster health.

If you're planning a migration from an existing MySQL replication setup to InnoDB Cluster, or debugging an existing cluster deployment, reach out.

Related reading: MySQL Performance Tuning Guide | MySQL 8.0 EOL Upgrade Guide | MySQL Group Replication Guide

Share this article

JusDB Team

Official JusDB content team

Deeper Reading

Curated companion guides for readers who want to go deeper on this topic.