High Availability

MariaDB Galera Cluster Production Guide: Mariabackup, MaxScale & Schema Strategy

Production MariaDB Galera Cluster: mariabackup-based SST, MaxScale Galera-aware routing, schema-change strategy with TOI vs RSU, and a troubleshooting playbook for the failures that actually take down production.

JusDB Team
May 3, 2026
17 min read
78 views

The classic MariaDB Galera Cluster failure goes like this: write throughput is steady at thousands of transactions per second, every node reports healthy, replication is synchronous — and then a single oversized transaction (a misconfigured ETL job, a backfill script, a long-running migration) collapses cluster throughput by an order of magnitude. The cause is rarely the database itself. It's flow control kicking in because the cluster's slowest applier cannot keep up with the writeset queue, and a single transaction larger than wsrep_max_ws_size permits is enough to trigger it across every node simultaneously.

This guide covers production MariaDB Galera Cluster deployment from a different angle than the typical "install three nodes" tutorial. The setup steps matter, but the differences that distinguish MariaDB Galera from the broader Galera ecosystem — mariabackup as the recommended SST tool, MaxScale as the MariaDB-native load balancer, the wsrep_provider package bundling in MariaDB, schema-change isolation modes — are where teams lose production hours. We cover what most setup tutorials skip: the cluster-killing edge cases, the tuning parameters that actually matter at scale, and a troubleshooting playbook for the failures that take down production.

TL;DR
  • MariaDB Galera ships with mariabackup as the recommended SST method — faster and more reliable than rsync, and avoids the donor-blocking problem of the legacy mysqldump path.
  • MaxScale is the MariaDB Foundation's purpose-built proxy for Galera, with the galeramon monitor and readwritesplit router. ProxySQL ≥ 2.0 also has native Galera support via mysql_galera_hostgroups — choose based on your existing stack rather than treating either as the only option.
  • Schema changes default to TOI (Total Order Isolation) — every ALTER TABLE stops the cluster until it completes. RSU (Rolling Schema Upgrade) is the production escape hatch, but only safe for backwards-compatible changes.
  • The killer production failure mode is flow-control runaway from oversized transactions — set wsrep_max_ws_size at 1 GB and add monitoring on wsrep_flow_control_paused before going live.
  • Do not use Galera for cross-WAN clusters without segment configuration — multi-region deployments need async replication or carefully tuned gcomm.segment values to avoid certification storms.

What Makes MariaDB Galera Different From MySQL Galera

MariaDB Galera and Percona XtraDB Cluster (the MySQL flavor) share the same wsrep API and the same Galera library underneath. The differences are operational — and they matter more in production than tutorials suggest.

Bundling and Version Compatibility

MariaDB ships the wsrep provider as part of the standard mariadb-server package starting from 10.4. The library lives at /usr/lib/galera/libgalera_smm.so and you reference it via wsrep_provider in your config. Percona XtraDB Cluster, by contrast, requires installing percona-xtradb-cluster-server as a distinct package — different binaries, different upgrade paths, different package conflict resolution. If your fleet is MariaDB-standardized, going with MariaDB Galera avoids the cross-distribution headache.

Galera version matters: MariaDB 10.4 ships Galera 4 (which adds streaming replication for large transactions and improved error reporting). MariaDB 10.3 ships Galera 3. Mixing major Galera versions across a single cluster is unsupported, and the upgrade path requires planned downtime — even rolling MariaDB upgrades can fail mid-flight if the Galera versions don't match.

SST Tooling: Why mariabackup Is the Right Choice

State Snapshot Transfer (SST) is how a new node receives a full data copy from an existing donor. MariaDB supports four SST methods, and the choice has direct production consequences:

  • mariabackup — non-blocking, hot backup that captures InnoDB pages and binary logs. The donor stays writable. This is the production default.
  • rsync — blocks the donor for the duration of the copy. For small datasets it works, but locks the donor out of writes for minutes to hours.
  • mysqldump — logical export, painfully slow for anything over 50 GB. Avoid except for tiny clusters.
  • xtrabackup-v2 — the Percona equivalent of mariabackup. Compatible but unnecessary if you're on MariaDB.
Common pitfall: Forgetting to install the mariadb-backup package on every node. The SST will start, fail at the donor side, and leave the joining node in a broken state requiring manual intervention. apt install mariadb-backup on every node before you bootstrap.

MaxScale vs ProxySQL for the Load Balancer Layer

MaxScale is the MariaDB Foundation's official proxy. Its readwritesplit router has Galera-specific intelligence via the galeramon monitor: it tracks wsrep_local_state, recognizes when a node is in DESYNC or DONOR mode, and stops sending traffic to it automatically.

ProxySQL ≥ 2.0 has equivalent native Galera support through the mysql_galera_hostgroups table. It auto-promotes/demotes nodes based on wsrep_local_state and supports the same writer/reader/backup-writer/offline hostgroup pattern. The earlier ProxySQL versions did require custom monitoring scripts; that requirement no longer applies.

The pragmatic choice rule: if your fleet already runs ProxySQL for other MySQL workloads, use it for Galera too — config consistency outweighs marginal feature differences. If you're greenfield on MariaDB, MaxScale is the lower-friction default because it ships from the same vendor and the docs assume MariaDB conventions.


Production Architecture: 3-Node Galera + MaxScale

The minimum viable production topology is three Galera nodes plus a MaxScale router. Three is the practical floor because Galera requires majority quorum to accept writes — a two-node cluster cannot recover safely if one node fails (the survivor sees 50% of the original cluster, which is not a majority). Even-node clusters work but waste a node: a 4-node cluster needs 3 survivors for quorum, the same as a 5-node cluster. Three or five nodes are the standard production sizes.

Production MariaDB Galera Cluster architecture Application connects to MaxScale, which routes to a 3-node MariaDB Galera cluster. An async replica in another region receives binlog stream for disaster recovery. Application writes + reads MaxScale Galera-aware readwritesplit Port 4006 · HA via VRRP wsrep · port 4567 ⬢ Node 1 MariaDB 10.11 Galera 4 SYNCED ⬢ Node 2 MariaDB 10.11 Galera 4 SYNCED ⬢ Node 3 MariaDB 10.11 Galera 4 SYNCED sync replication sync replication async binlog Async Replica (DR) Cross-region · read-only RPO ~1s · failover target
Three-node MariaDB Galera Cluster behind MaxScale, with an async replica for cross-region disaster recovery.

Hardware and Network Requirements

Galera certification and apply cost grows with write rate and node count. Production sizing guidelines, drawn from MariaDB Galera deployments JusDB operates:

  • CPU: 8 cores minimum per node. Certification (the conflict check) is fast and synchronous; the apply phase is what wsrep_slave_threads parallelizes. With high write rates and many slave threads, more cores help — but past 8 threads the gains depend on workload.
  • RAM: 32 GB minimum, with innodb_buffer_pool_size at 60-70% of system memory. Galera adds ~2 GB overhead for the GCache.
  • Network: Dedicated 10 Gbps NIC for cluster traffic if write rate exceeds 5,000 TPS. Galera tolerates a wide latency range — single-AZ deployments at < 2 ms p99 are ideal, multi-AZ at 5-10 ms works with adequate gcache, and cross-region above 50 ms is feasible only with segment configuration. Plan for the latency you have, not for an idealized number.
  • Storage: NVMe SSD only. Galera's apply path is sensitive to fsync latency on the InnoDB redo log and system tablespace.

Setting Up a Production MariaDB Galera Cluster

The setup walkthrough below assumes Ubuntu 22.04 LTS and MariaDB 10.11 LTS. The steps are similar for RHEL 9 — package names change but the config and bootstrap sequence are identical.

Step 1: Install MariaDB and mariabackup on All Three Nodes

bash
# On each of node1, node2, node3:
sudo apt update
sudo apt install -y software-properties-common dirmngr

# Add MariaDB official repo
curl -LsS https://r.mariadb.com/downloads/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=10.11

sudo apt update
sudo apt install -y mariadb-server mariadb-backup galera-4

# Confirm Galera library is present
ls /usr/lib/galera/libgalera_smm.so

# Stop MariaDB before configuring (we'll bootstrap manually)
sudo systemctl stop mariadb
sudo systemctl disable mariadb

Step 2: Configure /etc/mysql/conf.d/galera.cnf on Each Node

The key configuration parameters are identical on all three nodes except for wsrep_node_address and wsrep_node_name, which are unique per node.

ini
[mysqld]
# Required for Galera
binlog_format               = ROW
default_storage_engine      = InnoDB
innodb_autoinc_lock_mode    = 2
bind-address                = 0.0.0.0

# Galera provider
wsrep_on                    = ON
wsrep_provider              = /usr/lib/galera/libgalera_smm.so

# Cluster-level
wsrep_cluster_name          = "production_galera"
wsrep_cluster_address       = "gcomm://10.0.1.11,10.0.1.12,10.0.1.13"

# Per-node — DIFFERENT on each node
wsrep_node_address          = "10.0.1.11"
wsrep_node_name             = "node1"

# SST configuration
wsrep_sst_method            = mariabackup
wsrep_sst_auth              = "sstuser:strong-random-password-here"

# Production tuning
wsrep_provider_options      = "gcache.size=2G;gcs.fc_limit=128;gcs.fc_factor=0.8"
wsrep_slave_threads         = 8
wsrep_max_ws_size           = 1073741824  # 1 GB cap on writeset size
wsrep_max_ws_rows           = 131072

# InnoDB
innodb_buffer_pool_size     = 24G
innodb_log_file_size        = 2G
innodb_flush_log_at_trx_commit = 2
innodb_flush_method         = O_DIRECT

# Logging
log_error                   = /var/log/mysql/error.log
log_slow_verbosity          = query_plan,explain

Step 3: Bootstrap Node 1 and Create the SST User

Bootstrap node 1 first (this brings up the cluster with one node), then create the SST user on it. The user replicates to nodes 2 and 3 automatically when they join, so you do not need to create it on every node manually — but you do need it to exist before any other node attempts to join.

bash
# On node1 only — bootstrap the cluster as a single-node seed:
sudo galera_new_cluster

# Verify node1 is up and serving as cluster of size 1:
sudo mariadb -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
# Expected: 1

# Create the SST user (replicates to other nodes when they join):
sudo mariadb -e "
CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'strong-random-password-here';
GRANT RELOAD, PROCESS, LOCK TABLES, BINLOG MONITOR, BACKUP_ADMIN ON *.* TO 'sstuser'@'localhost';
FLUSH PRIVILEGES;
"

Step 4: Join Nodes 2 and 3

bash
# On node2:
sudo systemctl start mariadb

# On node3 (after node2 is fully synced):
sudo systemctl start mariadb

# Verify cluster size from any node
sudo mariadb -e "SHOW STATUS LIKE 'wsrep_cluster_size';"
# Expected output: 3

Step 5: Verify Cluster Health

sql
SHOW STATUS LIKE 'wsrep_%' \G

-- Critical fields to verify:
-- wsrep_cluster_size       = 3
-- wsrep_cluster_status     = Primary
-- wsrep_local_state_comment = Synced
-- wsrep_ready              = ON
-- wsrep_connected          = ON
Production tip: Encrypt cluster traffic with TLS. Add socket.ssl_cert, socket.ssl_key, and socket.ssl_ca to wsrep_provider_options. The performance overhead is under 5% with modern CPU AES-NI, and you avoid plaintext replication of every transaction across your network.

SST Tuning with mariabackup

The default mariabackup SST configuration works for small datasets, but production clusters benefit from explicit tuning of compression and parallelism.

Compression: Trade CPU for Network

ini
[sst]
streamfmt        = mbstream
compressor       = "qpress -T8"   # 8 threads
decompressor     = "qpress -d -T8"
inno-apply-opts  = "--use-memory=4G"
parallel         = 4               # Parallel file streams

For multi-terabyte clusters, qpress compression cuts SST time by 40-60% at the cost of moderate CPU usage on both donor and joiner. The parallel=4 setting allows mariabackup to read multiple InnoDB tablespace files concurrently — most useful when your data is spread across many tables.

Avoiding Donor Blocking

Even with mariabackup (a hot backup), the donor sees increased disk I/O during SST. If your donor is also serving production traffic, this can cause latency spikes. The mitigation is straightforward: configure wsrep_sst_donor with a comma-separated list of preferred donor nodes, with the last entry being a non-traffic-serving node. MaxScale should be configured to drain traffic from the donor during SST.


MaxScale Integration: Galera-Aware Routing

MaxScale's readwritesplit router becomes Galera-aware via the galeramon monitor module. This monitor polls wsrep_local_state on every node and automatically excludes nodes in DESYNC, DONOR, or JOINING states from the routing pool.

Sample MaxScale Config

ini
[Server-Node1]
type=server
address=10.0.1.11
port=3306
protocol=MariaDBBackend

[Server-Node2]
type=server
address=10.0.1.12
port=3306
protocol=MariaDBBackend

[Server-Node3]
type=server
address=10.0.1.13
port=3306
protocol=MariaDBBackend

[Galera-Monitor]
type=monitor
module=galeramon
servers=Server-Node1,Server-Node2,Server-Node3
user=maxscale_monitor
password=strong-monitor-password
monitor_interval=2000ms
disable_master_failback=true
available_when_donor=false
disable_master_role_setting=true

[Galera-Service]
type=service
router=readwritesplit
servers=Server-Node1,Server-Node2,Server-Node3
user=maxscale_app
password=strong-app-password
master_failure_mode=fail_on_write
master_reconnection=true

[Galera-Listener]
type=listener
service=Galera-Service
protocol=MariaDBClient
port=4006

The key flags for production: available_when_donor=false stops MaxScale from routing reads to a node currently serving as an SST donor (donor I/O is already saturated). master_failure_mode=fail_on_write ensures the application sees an explicit error rather than silent data loss when no node is available to accept writes.


Schema Changes: TOI vs RSU

Schema changes on Galera Cluster have two execution modes, and choosing the wrong one is one of the most common production mistakes.

TOI vs RSU schema change comparison TOI executes ALTER on all nodes in lockstep, blocking the entire cluster. RSU runs ALTER on one node at a time while the rest continue serving traffic. TOI — Cluster blocks during ALTER RSU — Rolling, only one node at a time t=0 ALTER complete t=0 ALTER complete N1 ALTER — writes blocked N2 ALTER — writes blocked N3 ALTER — writes blocked Cluster availability: 0% during ALTER All apps see write timeouts Use for: small tables, brief alters N1 ALTER serving N2 serving ALTER serving N3 serving ALTER Cluster availability: 66% throughout 2 of 3 nodes always serving traffic Use for: backwards-compatible changes ⚠ RSU is only safe for backwards-compatible changes Adding nullable columns, adding indexes, extending VARCHAR — yes. Dropping columns, type changes, NOT NULL adds — no.
TOI vs RSU: how the same ALTER TABLE affects cluster availability under each isolation mode.

TOI (Total Order Isolation) — The Default

Under TOI, the ALTER TABLE statement is replicated to all nodes via Galera and executed in the same total order on each. The cluster effectively pauses writes that conflict with the altered table until the change completes. For a 500 GB table, this means the cluster stalls for the duration of the alter — potentially hours.

RSU (Rolling Schema Upgrade) — The Production Escape

RSU runs the alter on one node at a time, with the affected node temporarily desynced from the cluster. The cluster keeps serving traffic on the remaining nodes. When the alter completes, the modified node rejoins.

sql
-- Switch the current session to RSU mode
SET SESSION wsrep_OSU_method = 'RSU';

-- Run the alter (it executes only on this node, with desync)
ALTER TABLE orders ADD COLUMN tracking_id VARCHAR(64) AFTER status;

-- Switch back to TOI for normal traffic
SET SESSION wsrep_OSU_method = 'TOI';
RSU is only safe for backwards-compatible changes. Adding a nullable column, adding an index, or extending a VARCHAR are RSU-safe. Removing a column, renaming a column, or changing a column type can break replication while nodes have differing schemas. Use TOI (and a maintenance window) for incompatible changes, or use pt-online-schema-change with the --no-check-replication-filters flag for production rolling alterations.

pt-online-schema-change with Galera

For very large tables where neither TOI nor RSU is acceptable, pt-online-schema-change works with Galera using triggers and a shadow table. The trigger pattern means writes during the migration are captured by both the old and new tables, and the cutover is atomic.

bash
pt-online-schema-change \
    --alter "ADD COLUMN tracking_id VARCHAR(64)" \
    --no-check-replication-filters \
    --execute \
    D=production,t=orders

Backup Strategy

Galera replication is not a backup. A logical corruption or accidental DROP TABLE propagates to every node within milliseconds. You need an out-of-cluster backup path.

mariabackup with Galera Awareness

bash
# Full backup
mariabackup --backup \
    --target-dir=/backups/full-$(date +%F) \
    --user=backup_user \
    --password=strong-pw \
    --galera-info \
    --compress \
    --compress-threads=8

# Prepare the backup (apply uncommitted log)
mariabackup --prepare --target-dir=/backups/full-2026-05-03

The --galera-info flag captures the cluster's GTID position, which is essential for restoring or for using the backup as the seed for a new cluster.

Binary Log Streaming for Point-in-Time Recovery

Combine the daily mariabackup with continuous binlog streaming via mysqlbinlog in pseudo-replica mode. This gives you sub-minute recovery point objective (RPO) without the performance cost of running an extra Galera node purely for backup purposes.

bash
mysqlbinlog \
    --read-from-remote-server \
    --raw \
    --stop-never \
    --host=10.0.1.11 \
    --user=binlog_replica \
    --password=strong-pw \
    --result-file=/backups/binlogs/ \
    mysql-bin.000001 &

Async Replica for Cross-Region DR

For disaster recovery, an asynchronous replica in a separate region is the standard pattern. The replica reads from one of the Galera nodes via standard binlog replication. If the entire primary region fails, you promote the async replica and accept the (typically small) data loss between the last replicated event and the failure.


Production Monitoring

Of the 200+ wsrep_* status variables, six are critical enough to alert on. The rest become useful only during incident debugging.

Critical Metrics

MetricAlert ThresholdWhat It Means
wsrep_cluster_size < expected count for > 30s A node has dropped from the cluster
wsrep_local_state_comment not "Synced" for > 60s This node is behind, joining, or in an error state
wsrep_flow_control_paused > 0.1 sustained Cluster is back-pressuring writes — most common cause of throughput collapse
wsrep_local_recv_queue_avg > 10 This node cannot apply writes fast enough to keep up
wsrep_local_cert_failures rate > 50/min Application is generating conflicting transactions across nodes
wsrep_cluster_status not "Primary" Quorum is lost — no writes can be accepted

Prometheus + mysqld_exporter Setup

bash
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.15.1.linux-amd64.tar.gz
sudo mv mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/local/bin/

# Create a monitoring user with minimal privileges
sudo mariadb -e "
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'monitoring-pw' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
"

The --collect.global_status collector includes wsrep variables. Add a Grafana dashboard with panels for the six critical metrics above and the standard MySQL throughput, latency, and connection panels — Percona's PMM dashboard for MySQL Galera works for MariaDB Galera with no modification.


Troubleshooting Production Failures

Cluster Lost Quorum (Non-Primary State)

If wsrep_cluster_status shows non-Primary, no writes are accepted. This typically happens after a network partition or a simultaneous node crash. Recovery requires identifying the most up-to-date node (highest wsrep_last_committed sequence number) and bootstrapping the cluster from it.

bash
# On each surviving node, check the last committed seqno:
sudo mariadb -e "SHOW STATUS LIKE 'wsrep_last_committed';"

# On the node with the highest seqno, force it to bootstrap a new cluster:
sudo nano /var/lib/mysql/grastate.dat
# Set: safe_to_bootstrap: 1

sudo galera_new_cluster

# Then start the other nodes normally — they will SST from the bootstrap node.

SST Hangs Indefinitely

If a joining node sits in JOINER state for hours, the SST has likely deadlocked. Common causes:

  • The donor's mariabackup user lacks BACKUP_ADMIN privilege
  • The joiner ran out of disk space mid-transfer
  • A firewall blocked port 4444 (the default mariabackup SST port)
  • Different MariaDB versions across nodes

Diagnose by checking the joiner's MariaDB error log first, then the donor's. The donor will log a clear error from mariabackup if authentication or permission fails. If the joiner is silent, check the SST log at /var/lib/mysql/sst.log.

Flow Control Runaway

The opening scenario of this post — sudden throughput collapse — is almost always flow control. Galera triggers flow control when one node's apply queue exceeds gcs.fc_limit. The cluster slows writes to let the slow node catch up. If a single transaction is large enough, the slow node never catches up and the cluster grinds to a halt.

Flow control trigger — how one slow node halts the cluster Phase 1: cluster healthy, all nodes apply at same rate. Phase 2: one node slows, apply queue grows past gcs.fc_limit. Phase 3: cluster pauses writes globally to let the slow node catch up. ① Healthy ② One node slow ③ Cluster halts Apply queues N1 N2 N3 All queues < fc_limit flow_control_paused = 0 8K TPS Apply queues N1 N2 N3 N2 queue > fc_limit (128) flow_control trigger fires Throughput dropping Apply queues N1 N2 N3 flow_control_paused > 0.5 Writes blocked cluster-wide 200 TPS ⚡ Alert on wsrep_flow_control_paused > 0.1 (sustained 30s) Catches the trigger before cluster reaches the halt phase. Mitigations: cap wsrep_max_ws_size at 1 GB, raise wsrep_slave_threads, increase gcs.fc_limit to 128.
Three-phase flow control runaway: healthy → one node falls behind → cluster-wide halt to let it catch up.

Mitigations, in order of effectiveness:

  1. Set wsrep_max_ws_size to 1 GB (default is 2 GB but rarely useful).
  2. Increase wsrep_slave_threads to match CPU core count on each node.
  3. Set gcs.fc_limit=128 in wsrep_provider_options (default 16 is too aggressive for production write rates).
  4. Identify and split large transactions in the application — never use a single transaction for bulk imports.

When MariaDB Galera Cluster Is the Wrong Choice

Galera is excellent for synchronous multi-master HA within a single data center. It is a poor fit for several other patterns:

  • Heavy single-row UPDATE workloads: Certification conflicts on hot rows scale poorly. A counter table with thousands of UPDATEs per second to a single row will produce constant wsrep_local_cert_failures. Use Redis, a queue-based pattern, or async replication instead.
  • Cross-WAN clusters without segments: Galera between regions adds 50-200ms write latency per transaction. For multi-region, use Galera within each region and async replication between them — or use gcomm.segment values to limit cross-segment traffic to certification only.
  • Massive multi-row transactions: ETL jobs, schema migrations on huge tables, or analytics workloads that batch hundreds of MB into single transactions will repeatedly trigger flow control. Run these against a non-Galera read replica or an async standby.
  • Workloads requiring strict transaction isolation: Galera uses optimistic certification with first-committer-wins — this is technically not full SERIALIZABLE isolation. Workloads that depend on transaction ordering across the cluster need a different consistency model.

Key Takeaways
  • Use mariabackup as your SST method — install mariadb-backup on every node before bootstrap to avoid joiner failures.
  • Run MaxScale with the galeramon monitor for Galera-aware routing — it understands cluster state in a way ProxySQL does not.
  • Cap writeset size at 1 GB via wsrep_max_ws_size and alert on wsrep_flow_control_paused > 0.1 to catch throughput collapse before it cascades.
  • Default schema changes to TOI; switch to RSU only for backwards-compatible changes; use pt-online-schema-change for large tables you cannot afford to lock.
  • Maintain an out-of-cluster backup path: daily mariabackup with --galera-info, continuous binlog streaming, and an async replica in a separate region for DR.

Working with JusDB on MariaDB Galera Cluster

JusDB manages production MariaDB Galera Clusters for engineering teams who need synchronous multi-master HA without the operational overhead of building cluster expertise in-house. Our DBAs handle topology design, mariabackup-based SST tuning, MaxScale Galera-aware routing, schema-change planning with TOI/RSU strategy, and 24/7 incident response — so your team ships features instead of fighting flow-control storms.

Explore JusDB MariaDB Galera Cluster Services →  |  MariaDB MaxScale Setup  |  Talk to a DBA

Related reading:

Share this article

JusDB Team

Official JusDB content team