Your DBA just filed a ticket: add a NOT NULL index to the order_events table — 500 million rows, peak traffic at 11 PM, and a hard SLA of 99.9% uptime. A vanilla ALTER TABLE in MySQL 5.7 would lock the table for hours; even MySQL 8.0's ALGORITHM=INPLACE holds a metadata lock during the cutover that can block writes for seconds or longer on a busy server. That gap is exactly why online schema change tools exist. Choosing the wrong one — or running it wrong — can silently stall replication, spike CPU, or corrupt data during the cutover window.
- pt-online-schema-change (pt-osc) uses triggers to mirror writes to a shadow table; mature and widely supported but adds trigger overhead and risks deadlocks on high-write tables.
- gh-ost uses MySQL binary logs instead of triggers, decoupling the migration from the write path; it is GitHub's production-tested approach and generally safer on write-heavy workloads.
- MySQL 8.0 INSTANT DDL handles many common changes (adding nullable columns, changing defaults, renaming columns) in milliseconds with zero table copy — use it first before reaching for any external tool.
- Both pt-osc and gh-ost monitor replication lag and throttle automatically; gh-ost's binlog approach makes lag monitoring more accurate and its cutover pause-and-resume flag is safer for precise scheduling.
- The cutover step — swapping the original and shadow tables — is the highest-risk moment in any online schema change and each tool handles it differently.
- For production MySQL managed by JusDB, we run schema changes with a tested playbook including dry-runs, lag thresholds, and rollback scripts prepared before execution begins.
Background
MySQL's native ALTER TABLE has improved significantly over the years. MySQL 5.6 introduced ALGORITHM=INPLACE for certain operations, and MySQL 8.0 added ALGORITHM=INSTANT for an even broader set. But for large tables under continuous write load, even an "online" native ALTER TABLE introduces risks: the metadata lock (MDL) acquired at the end of the operation can queue and starve incoming connections for several seconds, long enough to trigger application-level timeouts.
Online schema change tools sidestep this by building a shadow copy of the table incrementally, replaying writes in real time, and then performing a fast atomic swap — keeping the MDL window as short as possible. Percona's pt-online-schema-change (part of Percona Toolkit) and GitHub's gh-ost (Go-based, open source) are the two tools that dominate MySQL production deployments.
Why You Need Online Schema Change
Consider what happens with a standard ALTER TABLE ADD INDEX on a 500-million-row InnoDB table:
- MySQL acquires an exclusive metadata lock on the table.
- A full table scan is performed to build the index — potentially 30–90 minutes depending on hardware.
- All DML statements (
INSERT,UPDATE,DELETE) queue behind the MDL for the entire duration. - Connection pool exhaustion follows within seconds on any busy application.
Even with ALGORITHM=INPLACE, LOCK=NONE, the final lock to swap the new table definition still causes a brief but real MDL hold. For read-heavy operations this may be acceptable; for write-heavy tables at peak, it is not.
Before reaching for pt-osc or gh-ost, always check whether MySQL 8.0's ALGORITHM=INSTANT covers your change. Adding a nullable column, renaming a column (8.0.4+), changing a column default, or adding/dropping a virtual generated column are all instant and require no external tooling.
How pt-osc Works
pt-online-schema-change follows a trigger-based pattern introduced by the Facebook Online Schema Change tool and refined by Percona:
The Three-Phase Approach
- Create shadow table. pt-osc creates
_tablename_newwith the desired schema already applied. - Install three triggers on the original table:
AFTER INSERT,AFTER UPDATE, andAFTER DELETE. These triggers mirror every write to the shadow table in real time. - Copy existing rows in chunks. pt-osc scans the original table in configurable chunk sizes (default 1000 rows) and inserts those rows into the shadow table. It pauses between chunks based on replication lag and server load.
- Atomic RENAME. Once all rows are copied and the shadow table is fully current, pt-osc performs an atomic
RENAME TABLE original TO _original_old, _original_new TO originalto swap the tables. - Drop the old table and triggers. Cleanup follows automatically.
Key pt-osc Configuration Parameters
pt-online-schema-change \
--host=prod-mysql.internal \
--user=dba \
--ask-pass \
--alter "ADD INDEX idx_user_created (user_id, created_at)" \
--chunk-size=2000 \
--max-lag=1 \
--max-load="Threads_running=50" \
--critical-load="Threads_running=80" \
--no-drop-old-table \
--dry-run \
D=app_db,t=order_eventsThe most important parameters to understand:
--chunk-size: Number of rows copied per iteration. Increase for faster migrations on low-traffic tables; decrease to reduce I/O impact on busy ones.--max-lag: Pause chunk copying if any replica's replication lag exceeds this value in seconds. Default is 1 second — keep it here or lower for synchronous replication setups.--max-loadand--critical-load: Throttle or abort if the MySQL status variable exceeds the threshold.Threads_runningis the most reliable indicator of live query pressure.--no-drop-old-table: Recommended for production runs. Keep the old table as a rollback option; drop it manually once you confirm the new table is correct.
pt-osc's triggers fire within the same transaction as the original DML. On tables with high write concurrency this adds latency to every INSERT, UPDATE, and DELETE for the entire duration of the migration. On tables doing thousands of writes per second, this can measurably increase p99 latency. Benchmark before running on peak traffic hours.
pt-osc cannot be used on tables that have no primary key or unique index — it requires a way to identify rows uniquely for the chunk copy. It also cannot handle tables that are themselves the target of foreign key constraints from other tables without additional --alter-foreign-keys-method configuration.
How gh-ost Works
gh-ost (GitHub Online Schema Transmogrifier) takes a fundamentally different approach: instead of using triggers to capture writes, it connects to MySQL's binary log stream and replays change events directly. This decouples the migration from the write transaction path entirely.
The Binlog-Based Architecture
- Create the ghost table (
_tablename_ghc) with the new schema. - Connect as a replica to the MySQL binary log stream. gh-ost registers itself as a fake MySQL replica and streams binlog events from the primary (or an actual replica).
- Copy rows in chunks using range queries on the primary key, similar to pt-osc but without triggers in the write path.
- Apply binlog events to the ghost table as they arrive, keeping the ghost table current with all ongoing DML.
- Execute the cutover once the ghost table is fully caught up.
Running gh-ost
gh-ost \
--host=prod-mysql.internal \
--user=gh-ost \
--password="${DB_PASS}" \
--database=app_db \
--table=order_events \
--alter="ADD INDEX idx_user_created (user_id, created_at)" \
--chunk-size=2000 \
--max-lag-millis=1500 \
--max-load="Threads_running=50" \
--critical-load="Threads_running=80" \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--postpone-cut-over-flag-file=/tmp/ghost.postpone \
--panic-flag-file=/tmp/ghost.panic \
--initially-drop-old-table \
--initially-drop-ghost-table \
--verbose \
--executeThe Postpone Cutover Flag
One of gh-ost's most operationally valuable features is the postpone flag file. While gh-ost is running and the ghost table is fully caught up, it will not perform the cutover as long as the file at --postpone-cut-over-flag-file exists on disk. This lets you pre-run the migration, keep it in the "fully caught up and paused" state, and then initiate the cutover at your exact chosen moment by removing the file:
# Start migration — will pause before cutover as long as this file exists
touch /tmp/ghost.postpone
# Run gh-ost (it will copy all rows, stay current via binlog, and wait)
gh-ost ... --postpone-cut-over-flag-file=/tmp/ghost.postpone --execute
# When you are ready to cut over (e.g., during planned maintenance window)
rm /tmp/ghost.postpone
# gh-ost performs the atomic cutover within secondsgh-ost also supports a Unix socket for live interaction while the migration runs. You can inspect status, change the chunk size, adjust the lag threshold, or trigger an immediate cutover by writing to /tmp/gh-ost.tablename.sock — without stopping and restarting the process.
# Check migration status while running
echo "status" | nc -U /tmp/gh-ost.app_db.order_events.sock
# Throttle on the fly
echo "throttle" | nc -U /tmp/gh-ost.app_db.order_events.sock
# Resume after throttle
echo "no-throttle" | nc -U /tmp/gh-ost.app_db.order_events.sockThe Cutover Mechanism
gh-ost's cutover uses a lock-based atomic swap. It acquires a write lock on the original table, lets all in-flight writes complete, performs a sub-millisecond RENAME, and releases the lock. The design ensures that no writes are lost and that the lock window is as short as the RENAME latency — typically under 1 second on local storage, under 2–3 seconds on networked storage.
gh-ost requires binary logging to be enabled with binlog_format=ROW (or mixed). If your MySQL instance uses STATEMENT based replication, gh-ost will either fail to start or switch the session to ROW format (if you pass --switch-to-rbr). Verify that your replica topology handles ROW-format binlogs before running in production.
pt-osc vs gh-ost Comparison
The table below compares all four approaches side by side, including MySQL's native DDL options:
| Feature | pt-osc | gh-ost | MySQL INSTANT DDL | ALGORITHM=INPLACE |
|---|---|---|---|---|
| Write path impact | High — triggers fire in every write txn | None — binlog replay is async | None — metadata-only change | Low — no lock during copy phase |
| Mechanism | SQL triggers + row copy | Binlog streaming + row copy | Data dictionary update only | In-place rebuild, short MDL at end |
| Replication lag monitoring | Polls SHOW SLAVE STATUS |
Reads lag from binlog heartbeat (more accurate) | N/A | N/A |
| Cutover control | Automatic after copy completes | Postpone flag file, manual socket trigger | Immediate | Immediate (brief MDL) |
| Requires ROW binlog format | No | Yes | No | No |
| Foreign key support | Yes (with --alter-foreign-keys-method) |
No — must drop/re-add FKs manually | Yes (limited) | Yes |
| Tables without PK | No | No | Yes | Yes |
| Pausing / throttling | --max-lag, --max-load |
Flag file + live socket + --max-lag-millis |
N/A | N/A |
| Rollback method | --no-drop-old-table keeps original |
Ghost table kept until explicit cleanup | No rollback (instant) | Re-run original ALTER |
| Toolchain | Percona Toolkit (Perl) | gh-ost binary (Go, single binary) | Built-in MySQL 8.0+ | Built-in MySQL 5.6+ |
| Best for | Read-heavy tables, complex FK graphs | Write-heavy tables, precise cutover scheduling | Column additions, default changes (8.0) | Index additions on read-dominant workloads |
Running a Schema Change Safely
Whether you choose pt-osc or gh-ost, the following discipline applies to every production schema change on a large table.
Step 1: Check Whether INSTANT DDL Applies First
-- MySQL 8.0: test if your ALTER is instant before using any external tool
ALTER TABLE order_events
ADD COLUMN processed_at DATETIME NULL,
ALGORITHM=INSTANT;
-- If it succeeds without error, the operation is instant — no tool needed
-- If it returns "ERROR 1846 (0A000): ALGORITHM=INSTANT is not supported",
-- proceed with pt-osc or gh-ostStep 2: Run a Dry-Run First
# pt-osc dry-run
pt-online-schema-change \
--alter "ADD INDEX idx_user_created (user_id, created_at)" \
--dry-run \
D=app_db,t=order_events
# gh-ost dry-run (--execute omitted)
gh-ost \
--database=app_db \
--table=order_events \
--alter="ADD INDEX idx_user_created (user_id, created_at)" \
--verboseStep 3: Monitor Replication Lag During the Migration
-- Watch replication lag on all replicas
SELECT
MEMBER_ID,
MEMBER_HOST,
MEMBER_PORT,
MEMBER_STATE,
MEMBER_ROLE
FROM performance_schema.replication_group_members;
-- On traditional replication, monitor Seconds_Behind_Master
SHOW REPLICA STATUS\GBoth tools default to pausing chunk copies when replication lag exceeds 1 second. Do not increase this threshold to speed up migrations. Replica lag during a schema change can mask replication errors — keep the threshold at 1 second and let the migration take as long as it needs.
Step 4: Prepare a Rollback Script Before Starting
-- If using pt-osc with --no-drop-old-table, your rollback is:
RENAME TABLE order_events TO order_events_new_broken,
_order_events_old TO order_events;
DROP TABLE order_events_new_broken;
-- Verify triggers are cleaned up
SELECT TRIGGER_NAME, EVENT_MANIPULATION, EVENT_OBJECT_TABLE
FROM information_schema.TRIGGERS
WHERE EVENT_OBJECT_SCHEMA = 'app_db'
AND EVENT_OBJECT_TABLE = 'order_events';Step 5: Verify Row Counts After Completion
-- Confirm the new table has the expected row count
SELECT
table_name,
table_rows,
data_length,
index_length
FROM information_schema.tables
WHERE table_schema = 'app_db'
AND table_name IN ('order_events', '_order_events_old')
ORDER BY table_name;For mission-critical tables, run a CHECKSUM TABLE or a bounded row count comparison between the new table and the old table on a replica before dropping the old table. It takes a few minutes but confirms data integrity with certainty.
- Always check MySQL 8.0
ALGORITHM=INSTANTfirst — it handles many common changes in milliseconds and requires no external tooling. - Use pt-osc on tables with complex foreign key graphs or when your replication topology does not support ROW-format binlogs; use gh-ost on write-heavy tables where trigger overhead is unacceptable.
- gh-ost's postpone flag file gives you precise control over when the cutover happens, making it the better choice when you need to schedule the final swap to a defined maintenance window.
- Set
--max-lag(pt-osc) or--max-lag-millis(gh-ost) conservatively — 1 second or less — and never disable lag checking to speed up a migration. - Always run with
--no-drop-old-table(pt-osc) or keep the ghost table until you verify the new table is correct; prepare a rollback RENAME script before starting. - The cutover step is the highest-risk moment: keep it short by letting the migration fully catch up before scheduling the final swap, and avoid running it during peak write load.
Working with JusDB on MySQL
JusDB manages MySQL for engineering teams who need production-grade reliability without the operational overhead. Schema changes on large tables are one of the most common sources of production incidents — a misconfigured chunk size, a replication lag spike, or a cutover timed against peak traffic can turn a routine maintenance task into a multi-hour outage. Our DBAs design and execute schema changes with a tested playbook: dry-runs, per-replica lag monitoring, rollback scripts prepared and verified before execution begins, and cutover scheduling that accounts for your application's traffic patterns.
Explore JusDB MySQL Management → | Talk to a DBA
Related reading: