An e-commerce platform asked us to review their Galera Cluster after it had split-brain three times in two months. Their network had occasional 50-100ms latency spikes between availability zones, and Galera's synchronous replication was causing cascading timeouts. After migrating to MySQL Group Replication with single-primary mode, the split-brain incidents stopped — and write latency on cross-AZ transactions actually improved due to Paxos-based consensus handling network variability better than Galera's approach.
This guide covers the technical differences between MySQL Group Replication and Galera Cluster, when each performs best, and the failure modes you need to understand before choosing.
- MySQL Group Replication uses Paxos consensus and handles network partitions more gracefully than Galera.
- Galera provides true multi-primary with optimistic concurrency — good for write-scaling across nodes.
- Group Replication integrates natively with MySQL 8.x and InnoDB Cluster tooling.
- Both require low-latency network (<5ms RTT) for good performance — neither works well across high-latency WAN.
Architecture Comparison
| Dimension | MySQL Group Replication | Galera Cluster (PXC) |
|---|---|---|
| Consensus protocol | Paxos | Galera replication protocol (wsrep) |
| Write modes | Single-primary (default), multi-primary | Multi-primary (default) |
| Conflict detection | At certification layer | Optimistic (detect at commit) |
| Network partition | Majority partition continues | Quorum partition continues |
| Tooling | MySQL Shell, InnoDB Cluster | Percona toolkit, Galera arbitrator |
| Distribution | Bundled with MySQL 8.x | Percona XtraDB Cluster (PXC) |
MySQL Group Replication Setup
-- Enable Group Replication (MySQL 8.x)
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
SET GLOBAL group_replication_group_name = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee';
SET GLOBAL group_replication_start_on_boot = ON;
SET GLOBAL group_replication_local_address = '10.0.1.1:33061';
SET GLOBAL group_replication_group_seeds = '10.0.1.1:33061,10.0.1.2:33061,10.0.1.3:33061';
-- Bootstrap the group (first node only)
SET GLOBAL group_replication_bootstrap_group = ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group = OFF;-- Monitor group membership
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;
-- Check transaction queue
SELECT * FROM performance_schema.replication_group_member_stats\GGalera Cluster Setup
# /etc/mysql/conf.d/galera.cnf
[mysqld]
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_on=ON
wsrep_provider=/usr/lib/galera4/libgalera_smm.so
wsrep_cluster_name="prod_cluster"
wsrep_cluster_address="gcomm://10.0.1.1,10.0.1.2,10.0.1.3"
wsrep_node_address="10.0.1.1"
wsrep_sst_method=xtrabackup-v2-- Monitor Galera health
SHOW STATUS LIKE 'wsrep_%';
SELECT VARIABLE_NAME, VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME IN ('wsrep_cluster_size','wsrep_local_state_comment',
'wsrep_ready','wsrep_flow_control_paused');Failure Mode Comparison
Network Partition Behavior
In a 3-node cluster with a network partition leaving one node isolated:
- Group Replication: The majority partition (2 nodes) continues. The isolated node enters ERROR state and stops accepting writes. Paxos prevents split-brain.
- Galera: The majority partition continues. The isolated node suspends. With an even split (2+2 nodes), Galera arbitrator (garbd) must be deployed to provide quorum.
Deploy Galera with an odd number of nodes (3, 5) or add a Galera Arbitrator (garbd) for even-node clusters. Without quorum guarantees, network partitions can cause both partitions to believe they are primary.
Write Conflict Handling
-- Group Replication: conflict detection at certification layer
-- If two nodes update the same row, one transaction is rolled back
-- Monitor conflicts
SELECT TRANSACTIONS_IN_QUEUE, TRANSACTIONS_CERTIFIED, TRANSACTIONS_CONFLICTS_DETECTED
FROM performance_schema.replication_group_member_stats;When to Choose Each
Choose MySQL Group Replication when: you want native MySQL tooling and MySQL Shell InnoDB Cluster management, you are on MySQL 8.x and want single-primary HA with read replicas, or you have had split-brain issues with Galera.
Choose Galera (PXC) when: you need true multi-primary writes distributed across nodes, you use Percona's tooling ecosystem, or you have existing Galera expertise in your team.
- MySQL Group Replication handles network variability better via Paxos — prefer it for multi-AZ deployments.
- Galera provides better multi-primary write throughput when writes are distributed across nodes.
- Both require <5ms RTT between nodes for good performance — do not deploy across high-latency networks.
- Run 3+ nodes and test network partition behavior before go-live with either solution.
Working with JusDB on MySQL High Availability
JusDB designs, deploys, and manages MySQL Group Replication and Galera Cluster configurations for production workloads. We have migrated teams between both solutions and resolved split-brain incidents that had stumped internal teams for days.