Your replica SQL thread stopped at 3:47 AM with error 1062: Duplicate entry. By the time ops gets the alert, the replica is 40 minutes behind primary and your read traffic is hammering the primary directly. MySQL binlog replication failures are predictable — and mostly preventable with the right configuration and monitoring.
- Most replication errors fall into three categories: duplicate keys, missing rows, and data type mismatches
- GTID-based replication (MySQL 8.4) eliminates position-tracking bugs but requires clean setup
- Use
STOP REPLICA; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START REPLICA;only as a last resort - pt-table-checksum + pt-table-sync are the right tools for data drift repair
Understanding MySQL Binlog Replication
MySQL replication works by having the primary write all data changes to a binary log (binlog). The replica's IO thread connects to the primary, reads the binlog, and writes events to its own relay log. The SQL thread reads the relay log and replays events against the replica's data.
Replication Formats
-- Check current binlog format
SHOW VARIABLES LIKE 'binlog_format';
-- MySQL 8.0+ default and recommended:
SET GLOBAL binlog_format = 'ROW';
SET GLOBAL binlog_row_image = 'FULL';Diagnosing Replication Failures
Step 1: Read the Error Clearly
-- On the replica:
SHOW REPLICA STATUS\G
-- Key fields to check:
-- Replica_IO_Running: Yes/No
-- Replica_SQL_Running: Yes/No
-- Last_SQL_Error: error message
-- Seconds_Behind_Source: lag in secondsStep 2: Identify the Diverged Row
# Use mysqlbinlog to inspect the failing event
mysqlbinlog --base64-output=DECODE-ROWS -v \
--start-position=POSITION \
/var/lib/mysql/binlog.000042 | head -100Common Error Patterns and Fixes
| Error Code | Error Message | Root Cause | Fix |
|---|---|---|---|
| 1062 | Duplicate entry for key PRIMARY | Row exists on replica but not primary | Delete conflicting row on replica |
| 1032 | Can't find record in table | Row was deleted on replica before UPDATE arrived | Re-insert the expected row or skip |
| 1146 | Table doesn't exist | DDL ran on primary before replica was ready | Re-run DDL on replica manually |
| 1236 | Could not find first log file | Primary binlog purged before replica caught up | Rebuild replica from fresh dump |
Fixing Replication Errors
Fix Error 1062: Duplicate Entry
-- On the replica: identify the conflicting row
SELECT * FROM orders WHERE id = 12345;
-- Option 1: Delete the conflicting row and let replication re-insert it
DELETE FROM orders WHERE id = 12345;
START REPLICA;
-- Option 2: Skip ONE event (non-GTID mode only)
STOP REPLICA;
SET GLOBAL SLAVE_SKIP_COUNTER = 1;
START REPLICA;Never use SLAVE_SKIP_COUNTER in GTID mode — it's silently ignored and you'll skip the wrong transaction. In GTID mode, inject an empty transaction instead.
Fix Error 1062 in GTID Mode
-- Inject a no-op transaction to skip the failing GTID
STOP REPLICA;
SET GTID_NEXT = 'abc123:100';
BEGIN; COMMIT;
SET GTID_NEXT = 'AUTOMATIC';
START REPLICA;Repair Data Drift with pt-table-sync
# Step 1: Find diverged tables with pt-table-checksum
pt-table-checksum \
--user=checker --password=secret \
--host=primary.db.internal \
--replicate=percona.checksums \
--databases=myapp
# Step 2: Sync diverged rows
pt-table-sync \
--execute \
--sync-to-master \
--user=syncer --password=secret \
h=replica.db.internal,D=myapp,t=orderspt-table-sync --execute modifies data directly. Always run with --dry-run first and review the output before executing in production.
Preventing Replication Failures
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
relay_log_recovery = ON
relay_log_info_repository = TABLE
master_info_repository = TABLE
binlog_format = ROW
binlog_row_image = FULL
read_only = ON
super_read_only = ON- Enable GTID mode from the start — it eliminates position-tracking errors and simplifies failover with tools like Orchestrator or MHA.
- Use ROW-based binlog format with
binlog_row_image=FULLto ensure replicas get complete before/after row images. - Run pt-table-checksum weekly to detect data drift before it causes replication errors.
- Never skip errors with SLAVE_SKIP_COUNTER in GTID mode — inject empty transactions instead.
Working with JusDB on MySQL Replication
JusDB manages MySQL replication topologies for engineering teams who can't afford replication failures during peak traffic. Our DBAs configure GTID-based replication, set up automated drift detection, and provide 24/7 incident response.
Explore JusDB MySQL Management → | Talk to a DBA
Related reading: