MySQL

MySQL Replication Options: Native Replication vs Group Replication vs InnoDB Cluster

MySQL offers multiple replication topologies — traditional async replication, semi-sync, Group Replication, and InnoDB Cluster. Understanding the trade-offs determines whether your failover takes seconds or minutes.

JusDB Team
November 9, 2023
Updated May 30, 2026
9 min read
147 views

A startup engineering team spends months building their MySQL primary-replica setup, confident they have high availability covered. Then, during a late-night primary failure, they discover their async replica is 45 seconds behind — and those 45 seconds of transactions are simply gone. The manual failover takes 20 minutes, their on-call engineer is promoting the wrong host, and the application is throwing errors the entire time. This scenario plays out constantly because MySQL offers several distinct replication approaches, and teams frequently choose based on familiarity rather than actual durability requirements.

MySQL's replication ecosystem has grown significantly: you have traditional asynchronous replication, semi-synchronous replication, Group Replication (GA since MySQL 5.7), and InnoDB Cluster — a full HA stack built on top of Group Replication. Each solves a different problem. Picking the wrong one means either over-engineering a simple use case or under-protecting a critical one. This guide breaks down exactly how each option behaves under failure, what you give up, and when to reach for each.

TL;DR: Async replication is easy to operate but loses data on failover. Semi-sync reduces data loss risk but adds latency. Group Replication provides distributed consensus with zero data loss but requires low-latency networks. InnoDB Cluster wraps Group Replication with automated failover and routing — it's the production-grade choice for teams that need HA without building custom orchestration. Choose based on your RPO, RTO, and operational maturity.

Traditional Asynchronous Replication

Async replication is MySQL's default and the most widely deployed topology. The primary commits transactions and writes them to the binary log. Replicas connect, read the binlog, and apply events independently — with no acknowledgment back to the primary. This makes writes on the primary completely decoupled from replica state.

How It Works

The replica runs two threads: an IO thread that fetches binlog events from the primary and writes them to the local relay log, and a SQL thread (or parallel applier threads) that replays those events. The primary never waits for the replica.

text
# Primary my.cnf
[mysqld]
server_id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON

# Replica my.cnf
[mysqld]
server_id = 2
log_bin = /var/log/mysql/mysql-bin.log
relay_log = /var/log/mysql/relay-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
read_only = ON
text
-- On replica: configure and start replication
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = '10.0.1.10',
  SOURCE_USER = 'replicator',
  SOURCE_PASSWORD = 'strongpassword',
  SOURCE_AUTO_POSITION = 1;

START REPLICA;
SHOW REPLICA STATUS\G

Failure Behavior and Data Loss

When the primary crashes, any transactions committed but not yet received by the replica are permanently lost. Replication lag is the enemy here — a replica that is 10 seconds behind at the moment of failure loses 10 seconds of data. Failover is manual unless you layer on an external orchestrator like Orchestrator or ProxySQL with custom scripts.

Consistency risk: With async replication, your RPO is equal to your replication lag at the moment of failure. Under write-heavy load, this can easily be 30–120 seconds. There is no guarantee of durability beyond what the primary itself commits.

When to Use It

Async replication is appropriate for read scaling where data loss is acceptable (analytics replicas, reporting), for disaster recovery with relaxed RPO requirements, and for teams that need simplicity above all else. Do not rely on it as your sole HA mechanism for transactional workloads where data loss is unacceptable.


Semi-Synchronous Replication

Semi-sync is a plugin-based enhancement that requires the primary to wait for at least one replica to acknowledge receipt of the binlog event before returning success to the client. "Acknowledged" means the replica has written the event to its relay log — not that it has applied it — but this is enough to guarantee the event survives a primary failure.

Configuration

text
# Install plugins on primary and replica
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';

-- On primary
SET GLOBAL rpl_semi_sync_source_enabled = 1;
SET GLOBAL rpl_semi_sync_source_timeout = 1000;  -- ms before fallback to async
SET GLOBAL rpl_semi_sync_source_wait_for_replica_count = 1;

-- On replica
SET GLOBAL rpl_semi_sync_replica_enabled = 1;
text
# Persist in my.cnf (primary)
[mysqld]
plugin-load-add = semisync_source.so
rpl_semi_sync_source_enabled = 1
rpl_semi_sync_source_timeout = 1000
rpl_semi_sync_source_wait_for_replica_count = 1

The Timeout Fallback Problem

The critical gotcha with semi-sync is the timeout behavior. If a replica falls behind or disconnects, the primary waits up to rpl_semi_sync_source_timeout milliseconds and then falls back to fully asynchronous mode silently. You can check current mode with SHOW STATUS LIKE 'Rpl_semi_sync_source_status'. A value of OFF means you are currently running async without knowing it.

Silent async fallback: Semi-sync can silently degrade to async if replicas lag or disconnect. Monitor Rpl_semi_sync_source_status and Rpl_semi_sync_source_no_tx (count of transactions sent async) in your metrics pipeline. An alert on Rpl_semi_sync_source_status = OFF is essential.

When to Use It

Semi-sync is appropriate when you need near-zero data loss without the operational complexity of Group Replication, when your replicas are on the same network segment (low RTT keeps added latency minimal), and when you have a small number of replicas — typically one or two. It is a meaningful improvement over async for durability but still requires manual failover orchestration.


Group Replication

Group Replication (GR) is MySQL's built-in distributed consensus mechanism, available since MySQL 5.7.17 and significantly improved in 8.0. Rather than a single primary writing to passive replicas, GR uses Paxos-based consensus: every write must be certified by a majority of group members before it commits. This provides strong consistency guarantees that async and semi-sync cannot match.

Single-Primary vs Multi-Primary Mode

GR supports two modes. Single-primary (recommended for most workloads) routes all writes to one elected primary while other members are read-only replicas. If the primary fails, the group elects a new primary automatically — no external orchestrator needed. Multi-primary allows writes on all members simultaneously, but requires application-level handling of write conflicts and has significant restrictions on DDL and foreign keys.

Setting Up Group Replication

text
# my.cnf — all members need these settings
[mysqld]
server_id = 1                          # unique per member
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_checksum = NONE
log_replica_updates = ON
plugin-load-add = group_replication.so

group_replication_group_name = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_start_on_boot = OFF
group_replication_local_address = "10.0.1.10:33061"
group_replication_group_seeds = "10.0.1.10:33061,10.0.1.11:33061,10.0.1.12:33061"
group_replication_bootstrap_group = OFF
text
-- Bootstrap the group from the first node only
SET GLOBAL group_replication_bootstrap_group = ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group = OFF;

-- Join remaining members
START GROUP_REPLICATION;

-- Check group membership and primary
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

Consistency and Failure Behavior

GR uses a certification process: before a transaction commits, its write set is broadcast to all members. If no conflicting transaction is in-flight, it certifies and commits. If the primary fails, surviving members elect a new primary within seconds — typically 5–15 seconds depending on failure detection settings. No data loss occurs for committed transactions because certification ensures quorum acknowledgment.

Network requirements: Group Replication requires low-latency, high-bandwidth network connections between members. Round-trip latency above 5ms between members noticeably impacts write throughput. GR is not suitable for geographically dispersed nodes without careful tuning of group_replication_member_expel_timeout and related parameters.

When to Use It

Group Replication is appropriate when you need automatic primary election without external tooling, when you require zero data loss for committed transactions, and when you can accept slightly higher write latency due to consensus overhead. It is the foundation for InnoDB Cluster — but you can run it standalone if you want to manage the MySQL Shell and Router layers yourself.


MySQL InnoDB Cluster

InnoDB Cluster is Oracle's complete HA solution that bundles Group Replication (the consistency engine) with MySQL Shell (the management interface) and MySQL Router (the transparent application proxy). Where raw Group Replication requires you to manage failover awareness at the application layer, InnoDB Cluster handles it: MySQL Router monitors group membership and automatically redirects write traffic to the current primary and read traffic to replicas.

Architecture Overview

The three components work together: MySQL Shell provides a JavaScript/Python AdminAPI for cluster lifecycle management — creating clusters, adding instances, triggering failover. MySQL Router sits between your application and the cluster, exposing a read/write port (default 6446) and a read-only port (6447). Router reads cluster metadata from Group Replication and updates its routing table when membership changes. Applications connect to Router and are transparently redirected without any awareness of which member is currently primary.

Creating an InnoDB Cluster with MySQL Shell

text
# Install MySQL Shell
# Connect to the seed instance
mysqlsh --uri root@10.0.1.10:3306

# Check instance readiness
dba.checkInstanceConfiguration('root@10.0.1.10:3306')

# Configure instance (auto-fixes configuration issues)
dba.configureInstance('root@10.0.1.10:3306')

# Create the cluster
var cluster = dba.createCluster('ProductionCluster')

# Add secondary members
cluster.addInstance('root@10.0.1.11:3306')
cluster.addInstance('root@10.0.1.12:3306')

# Check cluster status
cluster.status()
text
# Deploy MySQL Router (run on application servers or dedicated proxy hosts)
mysqlrouter --bootstrap root@10.0.1.10:3306 --directory /etc/mysqlrouter
mysqlrouter --config /etc/mysqlrouter/mysqlrouter.conf &

# Application connects to Router ports:
# Read/Write:  127.0.0.1:6446
# Read-Only:   127.0.0.1:6447

Automatic Failover in Practice

When a primary fails, Group Replication elects a new primary within seconds. MySQL Router detects the topology change — it polls the cluster metadata every second by default — and begins routing writes to the new primary. From the application's perspective, existing connections on port 6446 receive a connection error (which the application must handle with retry logic), and new connections route to the new primary. Total client-visible downtime is typically 10–30 seconds depending on detection timing.

text
# Monitor cluster health
cluster.status()

# Manual switchover (planned maintenance)
cluster.setPrimaryInstance('root@10.0.1.11:3306')

# Force failover if primary is unresponsive
cluster.forceQuorumUsingPartitionOf('root@10.0.1.11:3306')

# Rejoin a recovered member
cluster.rejoinInstance('root@10.0.1.10:3306')

When to Use It

InnoDB Cluster is the right choice when you need a complete, supported HA stack without building custom orchestration, when you want automatic failover with transparent application routing, and when your team is willing to learn MySQL Shell's AdminAPI. It is more operationally complex than standalone async replication but dramatically less complex than building equivalent automation yourself. For production MySQL workloads where downtime is measured in revenue impact, InnoDB Cluster is the current best-practice recommendation.


Comparison Table

Feature Async Replication Semi-Sync Group Replication InnoDB Cluster
Data Loss on Failover Yes (lag-dependent) Near-zero (relay log receipt) Zero (committed txns) Zero (committed txns)
Failover Type Manual Manual Automatic Automatic + transparent routing
Failover Time Minutes (manual) Minutes (manual) 5–15 seconds 10–30 seconds (includes router)
Write Latency Overhead None Low (1 RTT) Medium (consensus round) Medium (consensus round)
Read Scaling Yes (replicas) Yes (replicas) Yes (secondary members) Yes (Router port 6447)
Multi-Primary Writes No No Yes (with restrictions) Yes (not recommended)
External Orchestrator Needed Yes (Orchestrator, etc.) Yes No No (MySQL Router handles routing)
Minimum Nodes 2 2 3 3
Network Sensitivity Low Low-medium High High
Operational Complexity Low Medium High High (but managed via Shell)
Best For Read replicas, DR Improved durability, small clusters Self-managed HA Production HA with full automation

Choosing the Right Option for Your Workload

Start with Your RPO and RTO Requirements

Recovery Point Objective (RPO) defines how much data loss you can tolerate. Recovery Time Objective (RTO) defines how long your system can be unavailable. If your RPO is zero and your RTO is under 30 seconds, InnoDB Cluster is the only option that reliably delivers both without custom tooling. If you can tolerate 5 minutes of downtime and some data loss, async replication with an orchestrator is far simpler to operate.

Operational Maturity Matters

Group Replication and InnoDB Cluster introduce new failure modes — split-brain scenarios, quorum loss, members expelled for being too far behind — that require operational knowledge to handle correctly. A team that has never run Group Replication will struggle during an incident. If your team is smaller or less experienced with MySQL internals, async replication with Orchestrator for automated failover is often a better risk-adjusted choice than deploying InnoDB Cluster without the expertise to operate it.

Network Architecture Is Non-Negotiable for GR

Group Replication requires reliable, low-latency connectivity between all members. Deploying GR across availability zones with 2–3ms RTT is reasonable. Deploying across regions with 50ms RTT will cause constant performance problems and member expulsions. If your architecture requires geographic distribution, consider async replication for the cross-region leg and Group Replication within each region.

Application Connection Handling

With async and semi-sync replication, your application likely connects directly to MySQL hosts or through a proxy like ProxySQL. With InnoDB Cluster, applications connect to MySQL Router — which means Router becomes a critical component in your stack. Ensure Router is deployed with redundancy (multiple Router instances) and that your application implements connection retry logic, since a failover will momentarily drop active connections.

Key Takeaways
  • Async replication is simple but loses data proportional to replication lag at failover time — do not use it as your sole HA mechanism for transactional workloads.
  • Semi-sync reduces data loss risk to near-zero but can silently fall back to async mode; monitor Rpl_semi_sync_source_status continuously.
  • Group Replication provides automatic primary election and zero data loss for committed transactions, but requires low-latency networks and 3+ members for quorum.
  • InnoDB Cluster (Group Replication + MySQL Shell + MySQL Router) is the complete HA stack — it handles failover detection, primary election, and transparent write routing without custom orchestration.
  • Choose your replication topology based on RPO, RTO, network architecture, and your team's operational maturity — not default familiarity with async setups.
  • Always test failover under load before relying on any topology in production; theoretical behavior and real-world behavior diverge significantly under stress.

Working with JusDB on MySQL Replication

JusDB designs and manages MySQL replication topologies for engineering teams — from simple primary-replica setups to full InnoDB Cluster with automatic failover. Our DBAs match your replication strategy to your actual durability and availability requirements.

Explore JusDB MySQL Management →  |  Talk to a DBA

Related reading:

Share this article