In early 2023, an e-commerce company discovered that their nightly backup strategy had a fatal design flaw — their mysqldump jobs were writing backup files to the same physical server that housed the primary database. At 3 AM on a Tuesday, a RAID controller failure corrupted the primary disk array. The backup files were gone with it. By the time engineering was paged and a recovery plan was assembled, six hours of order data, customer records, and inventory updates had been permanently lost. The manual reconciliation effort — cross-referencing payment processor logs, email records, and warehouse system exports — cost an estimated $200,000 in engineering time and customer refunds. The backup had run successfully every night for two years. Nobody had ever tested a restore, and nobody had ever verified that "backup to the same server" was not actually a backup at all.
- RTO (Recovery Time Objective) defines how long your system can be down; RPO (Recovery Point Objective) defines how much data loss is acceptable — these two numbers should drive every backup decision you make.
- Logical backups (mysqldump, pg_dump) are portable and simple but restore slowly on large databases and cannot support point-in-time recovery on their own.
- Physical backups (Percona XtraBackup, pg_basebackup, pgBackRest) restore 5–20x faster and enable PITR when combined with WAL/binlog archiving.
- Continuous WAL archiving (PostgreSQL) or binlog archiving (MySQL) is the only way to achieve RPO under 5 minutes without synchronous replication.
- The 3-2-1 rule — 3 copies, 2 different media, 1 offsite — is the minimum baseline; the upgraded 3-2-1-1-0 adds an air-gapped copy and mandates zero untested backups.
- An untested backup is not a backup. Run a full restore drill monthly and automate restore verification in your CI/CD pipeline.
RTO and RPO — The Two Numbers That Define Your Backup Strategy
Before choosing a tool or writing a cron job, you need two numbers from your business stakeholders. Without them, every backup decision is guesswork.
Recovery Time Objective (RTO) is the maximum acceptable duration of downtime after a failure — from the moment the incident is declared to the moment the system is serving traffic again. An RTO of four hours means your business can survive four hours of database unavailability. An RTO of 15 minutes means your on-call team has 15 minutes to restore from backup before the business impact becomes unacceptable.
Recovery Point Objective (RPO) is the maximum acceptable age of data after recovery — in other words, how much data loss the business can tolerate. An RPO of 24 hours means losing up to one day of data is acceptable. An RPO of zero means no data loss is tolerable, which requires synchronous replication, not backup alone. An RPO of five minutes means your archiving pipeline must ship WAL segments or binlog files at least once per minute.
RPO = 0 cannot be achieved with backups alone. Zero data loss requires synchronous replication (PostgreSQL synchronous_commit = on, MySQL Group Replication in synchronous mode). Backups reduce RPO but never eliminate it entirely.
Use this table to map your business tier to concrete targets before selecting tools:
| Business Tier | RTO Target | RPO Target | Backup Approach |
|---|---|---|---|
| Internal tools, dev/staging | 4–24 hours | 24 hours | Nightly logical backup (mysqldump / pg_dump) to offsite storage |
| SaaS — non-transactional | 1–4 hours | 1–4 hours | Physical backup daily + WAL/binlog archiving every 15–60 min |
| SaaS — transactional | 15–60 minutes | 5–15 minutes | Physical backup + continuous WAL/binlog archiving (every minute) |
| Financial / healthcare | <15 minutes | <5 minutes | Physical backup + continuous archiving + hot standby replica |
The Backup Hierarchy (Not All Backups Are Equal)
There are four distinct backup types in common use. Each sits at a different position in the trade-off between speed, portability, and recovery granularity. Understanding where each type fits is prerequisite to building a strategy that actually meets your RTO and RPO.
Logical Backups (mysqldump, pg_dump)
Logical backups export database objects as SQL statements or structured file formats. They are human-readable, portable across versions, and can target individual tables or schemas. The critical limitation is restore speed: restoring a 500 GB logical backup means replaying hundreds of millions of INSERT statements. On a production-grade machine, expect mysqldump restore of a 1 TB database to take 4–8 hours. pg_dump with a parallel restore via pg_restore is faster but still measured in hours for large datasets. Logical backups alone cannot support point-in-time recovery.
Physical Backups (XtraBackup, pg_basebackup)
Physical backups copy the raw data files from the database storage directory. They are not human-readable but restore dramatically faster because recovery is a file copy operation rather than SQL replay. Percona XtraBackup can restore a 1 TB MySQL database in 30–60 minutes. pg_basebackup feeding into pgBackRest with parallel restore targets 15–30 minutes for the same size. Physical backups are the foundation of any strategy that requires RTO under two hours.
Continuous WAL/Binlog Archiving — Enables PITR
Write-Ahead Log (WAL) archiving in PostgreSQL and binary log (binlog) archiving in MySQL capture every change made to the database as it happens. Combined with a physical base backup, continuous archiving enables point-in-time recovery (PITR): you can restore the database to any specific timestamp, not just the moment the last backup completed. This is the mechanism that converts RPO from "however old my last backup was" to "as recent as the last shipped WAL segment." For RPO under five minutes, WAL or binlog archives must be shipped at least once per minute.
Snapshot Backups (EBS, Cloud Storage Snapshots)
Cloud volume snapshots (AWS EBS snapshots, GCP Persistent Disk snapshots, Azure Managed Disk snapshots) capture the storage layer at a point in time. They are fast to initiate, require minimal database-level configuration, and restore quickly by attaching a new volume. The limitation is granularity: you can only restore to a snapshot point, not to an arbitrary timestamp within a snapshot interval. For PostgreSQL, snapshot-based recovery still requires WAL replay from the snapshot point to the recovery target if you need sub-snapshot granularity.
MySQL Backup Tools
mysqldump — Logical Backup for Small Databases and Schema Exports
mysqldump is the lowest-barrier backup tool available for MySQL. It is pre-installed with every MySQL distribution, requires no additional dependencies, and produces portable SQL output that can be loaded into any MySQL-compatible database. For InnoDB tables, always use --single-transaction to take a consistent snapshot without acquiring table locks — without this flag, mysqldump holds a global read lock for the duration of the export.
# Consistent InnoDB backup — no table locks, uses REPEATABLE READ snapshot
mysqldump \
--single-transaction \
--routines \
--triggers \
--events \
--set-gtid-purged=OFF \
-u root -p mydb > /backup/mydb-$(date +%Y%m%d-%H%M%S).sql
# Compress on the fly for large databases
mysqldump --single-transaction -u root -p mydb | gzip > /backup/mydb-$(date +%Y%m%d).sql.gz
# Schema-only export (useful for documentation and migrations)
mysqldump --no-data --routines --triggers -u root -p mydb > /backup/schema-$(date +%Y%m%d).sqlUse mysqldump for databases under 50 GB, schema-only exports for any size, and point-in-time restore scenarios where you have binlog archives to replay after loading the dump.
Percona XtraBackup — Physical Hot Backup with Incremental Support
Percona XtraBackup performs a physical, non-blocking backup of InnoDB and XtraDB tables while the database continues serving reads and writes. It copies data files and records InnoDB redo log changes during the copy, then applies those changes during the prepare phase to produce a consistent backup. Incremental backups capture only pages changed since the last full or incremental backup, dramatically reducing backup window size and storage consumption.
# Full backup
xtrabackup \
--backup \
--user=backup_user \
--password=secret \
--target-dir=/backup/full
# Prepare the full backup (makes it consistent and ready for restore)
xtrabackup --prepare --target-dir=/backup/full
# Incremental backup based on the last full backup
xtrabackup \
--backup \
--user=backup_user \
--password=secret \
--target-dir=/backup/incr-$(date +%Y%m%d) \
--incremental-basedir=/backup/full
# Prepare incremental: apply incr on top of the full backup
xtrabackup --prepare --apply-log-only --target-dir=/backup/full
xtrabackup --prepare --target-dir=/backup/full --incremental-dir=/backup/incr-$(date +%Y%m%d)
# Restore: stop MySQL, move data files, restore, fix permissions
systemctl stop mysql
mv /var/lib/mysql /var/lib/mysql.old
xtrabackup --copy-back --target-dir=/backup/full
chown -R mysql:mysql /var/lib/mysql
systemctl start mysqlMySQL Binlog Archiving for PITR
Enable binary logging in my.cnf and ship binlog files to offsite storage continuously. Combined with an XtraBackup base backup, binlog archiving provides PITR to any transaction boundary.
[mysqld] log_bin = /var/log/mysql/mysql-bin.log binlog_format = ROW expire_logs_days = 7 sync_binlog = 1 server_id = 1
# Replay binlogs from a specific position to a target timestamp (PITR)
mysqlbinlog \
--start-datetime="2026-02-22 03:00:00" \
--stop-datetime="2026-02-22 08:59:59" \
/var/log/mysql/mysql-bin.000123 \
/var/log/mysql/mysql-bin.000124 | mysql -u root -pAlways test your restore process, not just the backup job. A backup that runs nightly without ever being validated is security theater. Set a calendar reminder and restore to a staging instance monthly — use the actual backup files your production cron produces, not a manually triggered one-off.
PostgreSQL Backup Tools
pg_dump / pg_dumpall — Logical Backup
pg_dump produces a consistent logical backup of a single PostgreSQL database. It uses a repeatable-read snapshot internally, so it does not block concurrent writes. pg_dumpall exports all databases plus global objects (roles, tablespaces). For parallel logical backup and restore on large databases, use the directory output format with pg_dump -Fd and restore with pg_restore -j.
# Single database backup — custom format (compressed, supports selective restore)
pg_dump \
-h localhost \
-U postgres \
-Fc \
-f /backup/mydb-$(date +%Y%m%d).dump \
mydb
# Parallel restore from custom format (4 parallel workers)
pg_restore \
-h localhost \
-U postgres \
-d mydb \
-j 4 \
/backup/mydb-20260222.dump
# All databases + globals (for full cluster backup)
pg_dumpall -h localhost -U postgres | gzip > /backup/cluster-$(date +%Y%m%d).sql.gzpg_basebackup — Physical Base Backup
pg_basebackup copies the entire PostgreSQL data directory while the server is running. It is the standard tool for creating a physical base backup for use with PITR. The -R flag writes a standby.signal and populates recovery.conf / postgresql.auto.conf automatically, making this the fastest path to creating a streaming replica or a PITR-ready backup.
# Physical base backup: tar format, gzip compressed, streaming WAL, progress output
# -R writes recovery configuration for standby/PITR use
pg_basebackup \
-h localhost \
-U replication_user \
-D /backup/base \
-Ft \
-z \
-Xs \
-P \
-RpgBackRest — Production-Grade Backup with Compression, Encryption, Retention, and Parallel Restore
pgBackRest is the production standard for PostgreSQL backup management. It handles full, differential, and incremental backups, WAL archiving, retention policy enforcement, AES-256-CBC encryption, S3/GCS/Azure storage, parallel backup and restore, and delta restore (only copying changed files during restore). For any PostgreSQL deployment with RTO requirements under two hours, pgBackRest is the right tool.
# /etc/pgbackrest/pgbackrest.conf [global] repo1-path=/backup/pgbackrest repo1-retention-full=7 repo1-retention-diff=30 repo1-cipher-type=aes-256-cbc repo1-cipher-pass=your-strong-passphrase-here compress-type=lz4 compress-level=6 process-max=4 [mydb] pg1-path=/var/lib/postgresql/16/main pg1-port=5432
# One-time stanza creation (initializes the backup repository for this cluster)
pgbackrest --stanza=mydb stanza-create
# Full backup
pgbackrest --stanza=mydb --type=full backup
# Differential backup (changes since last full)
pgbackrest --stanza=mydb --type=diff backup
# Incremental backup (changes since last full or diff)
pgbackrest --stanza=mydb --type=incr backup
# List backups with size and WAL range
pgbackrest --stanza=mydb info
# Restore to latest
pgbackrest --stanza=mydb restore
# Point-in-time restore to a specific timestamp
pgbackrest \
--stanza=mydb \
--type=time \
--target="2026-02-22 08:59:00" \
--target-action=promote \
restoreWAL Archiving for PITR
Configure PostgreSQL to archive WAL segments to pgBackRest as they are produced. With this in place, every transaction is captured and recovery to any point in time is possible within the retention window.
# postgresql.conf — WAL archiving settings archive_mode = on archive_command = 'pgbackrest --stanza=mydb archive-push %p' wal_level = replica max_wal_senders = 3
The 3-2-1 Backup Rule (and Why 3-2-1-1-0 Is Better)
The 3-2-1 rule was coined by photographer Peter Krogh in his 2005 book on digital asset management. The principle generalizes directly to database backups: keep at least 3 copies of your data, on at least 2 different types of storage media, with at least 1 copy stored offsite. This rule protects against the scenario the e-commerce company above lived through — a single hardware failure taking out both primary and backup simultaneously.
The extra 1 — air-gapped backup. The rise of ransomware attacks targeting backup repositories has exposed the weakness of the original 3-2-1 rule: if all three copies are network-accessible, a ransomware actor who has compromised your environment can encrypt all of them. An air-gapped copy — one that is not reachable from the production network, written to offline tape or to an object storage bucket with object lock enabled — survives ransomware. Major cloud providers offer object lock (WORM — Write Once Read Many) on S3, GCS, and Azure Blob Storage that enforces this protection without physical media.
The 0 — zero untested backups. Every backup in your retention window must have been successfully restored at least once. If you have 30 days of daily backups, you should have restored from at least one of them in the last 30 days. A backup whose restore has never been validated is not a backup — it is a file that might restore correctly, which is a meaningfully different thing when it is 3 AM and your production database is gone.
Enable S3 Object Lock (Governance or Compliance mode) on your backup bucket to prevent ransomware or accidental deletion from destroying your offsite copies. A locked bucket cannot have objects deleted or overwritten within the retention period, even by the AWS root account in Governance mode.
Backup to Cloud Storage (S3, GCS, Azure Blob)
Cloud object storage is the natural destination for database backups: it is cheap, durable (S3 and GCS target 11 nines of durability), geographically redundant by default, and accessible from any recovery target. Both pgBackRest and XtraBackup support direct upload to object storage.
pgBackRest S3 Configuration
# /etc/pgbackrest/pgbackrest.conf — S3 repository [global] repo1-type=s3 repo1-path=/pgbackrest repo1-s3-bucket=your-backup-bucket-name repo1-s3-region=us-east-1 repo1-s3-endpoint=s3.amazonaws.com repo1-cipher-type=aes-256-cbc repo1-cipher-pass=your-strong-passphrase-here repo1-retention-full=7 repo1-retention-diff=30 compress-type=lz4 process-max=4 [mydb] pg1-path=/var/lib/postgresql/16/main pg1-port=5432
Use an IAM role attached to the instance (not hardcoded access keys) to grant pgBackRest S3 access. The role needs s3:GetObject, s3:PutObject, s3:DeleteObject, and s3:ListBucket on the target bucket.
XtraBackup to S3 with xbcloud
# Stream XtraBackup directly to S3 via xbcloud — no local disk needed
xtrabackup --backup --stream=xbstream --user=backup_user --password=secret \
--encrypt=AES256 --encrypt-key-file=/etc/backup-encrypt.key | \
xbcloud put \
--storage=s3 \
--s3-bucket=your-backup-bucket-name \
--s3-region=us-east-1 \
mydb-$(date +%Y%m%d-%H%M%S)
# Restore from S3 via xbcloud
xbcloud get \
--storage=s3 \
--s3-bucket=your-backup-bucket-name \
--s3-region=us-east-1 \
mydb-20260222-030000 | \
xbstream -x -C /var/lib/mysqlTesting Your Backups (The Most Skipped Step)
In a survey of database incidents, the most common root cause of failed recovery was not the backup process itself — it was the discovery that the backup was incomplete, corrupted, or from the wrong database after the incident had already started. Backup testing is the only way to know, with certainty, that your recovery procedure works before you need it.
Monthly Restore Drill
Each month, pick a backup from your retention window (not the most recent — pick one from ten days ago to verify retention is working), restore it to a separate instance, run application smoke tests against the restored database, and record the wall-clock time from restore initiation to database available. Compare this against your RTO. If the restore took longer than your RTO, you have a strategy problem to solve before the next incident.
Automated Restore Verification Script
#!/bin/bash
# restore-verify.sh — run in CI or on a schedule to automatically verify backups
set -euo pipefail
STANZA="mydb"
RESTORE_HOST="restore-test-instance"
RESTORE_PORT="5433"
PGPASSWORD="${RESTORE_PG_PASS}"
export PGPASSWORD
# Restore the latest backup to a test instance
pgbackrest \
--stanza="${STANZA}" \
--pg1-path=/var/lib/postgresql/restore-test \
--pg1-port="${RESTORE_PORT}" \
restore
# Start the restored instance
pg_ctlcluster 16 restore-test start
# Wait for PostgreSQL to be ready
for i in $(seq 1 30); do
pg_isready -h "${RESTORE_HOST}" -p "${RESTORE_PORT}" && break
sleep 2
done
# Run basic sanity checks
psql -h "${RESTORE_HOST}" -p "${RESTORE_PORT}" -U postgres -c "
SELECT
count(*) AS table_count
FROM information_schema.tables
WHERE table_schema = 'public';
" mydb
# Check row counts on critical tables
psql -h "${RESTORE_HOST}" -p "${RESTORE_PORT}" -U postgres -c "
SELECT
relname AS table_name,
n_live_tup AS estimated_rows
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 10;
" mydb
echo "Restore verification passed at $(date)"
# Stop and clean up the test instance
pg_ctlcluster 16 restore-test stop
rm -rf /var/lib/postgresql/restore-testAn untested backup is not a backup. It is a file that may or may not contain a recoverable database. Schedule restore drills on a calendar, automate restore verification in your CI pipeline, and treat a failed restore verification with the same urgency as a production incident — because the next one might be.
Backup Retention Policies
Retention policy design involves balancing storage cost against recovery flexibility. The standard tiered approach — daily backups for one week, weekly backups for one month, monthly backups for one year — covers the most common recovery scenarios: accidental data deletion discovered the same day (recover from last night), data corruption discovered after a week (recover from a weekly), and compliance-driven recovery from months ago (monthly archive).
pgBackRest Retention Configuration
# pgbackrest.conf — retention settings # Full backups: keep the last 7 repo1-retention-full=7 # Differential backups: keep the last 30 repo1-retention-diff=30 # WAL archive: retained automatically to support the oldest backup # (pgBackRest calculates the minimum WAL needed and expires older segments) repo1-retention-archive=7 repo1-retention-archive-type=full
# Recommended backup schedule (crontab or systemd timers)
# Full backup: Sundays at 1 AM
0 1 * * 0 pgbackrest --stanza=mydb --type=full backup
# Differential backup: Monday–Saturday at 1 AM
0 1 * * 1-6 pgbackrest --stanza=mydb --type=diff backup
# Incremental backup: every 6 hours (for tighter RPO)
0 */6 * * * pgbackrest --stanza=mydb --type=incr backup
# Expire old backups according to retention policy (runs after each backup automatically)
# Manual expiry if needed:
# pgbackrest --stanza=mydb expireFor MySQL with XtraBackup, implement retention at the S3 level using S3 Lifecycle rules to transition backups older than 30 days to Glacier and delete them after 365 days. This keeps storage costs flat regardless of backup frequency.
Store your pgbackrest.conf and your backup encryption keys in a secrets manager (AWS Secrets Manager, HashiCorp Vault, or equivalent) that is independent of your database servers. During a disaster recovery scenario, your database servers may not be accessible — your recovery tooling must be reachable from a clean instance without depending on the failed infrastructure.
- Define RTO and RPO with business stakeholders before choosing any backup tool — these two numbers dictate every decision downstream.
- Use physical backups (XtraBackup, pgBackRest) for any database where RTO is under two hours; logical backups cannot restore fast enough at scale.
- Enable continuous WAL archiving (PostgreSQL) or binlog archiving (MySQL) to achieve RPO under five minutes — archiving without a base backup is incomplete.
- Apply the 3-2-1-1-0 rule: three copies, two media types, one offsite, one air-gapped or object-locked, zero untested backups.
- Encrypt backups at rest using AES-256-CBC (pgBackRest) or XtraBackup's built-in encryption and store keys separately from backup files.
- Run a monthly restore drill to a separate instance and automate restore verification — discovering that your backup does not restore during an incident is a second disaster on top of the first.
- Configure retention tiers (daily/weekly/monthly) and enforce them automatically via pgBackRest retention settings or cloud object storage lifecycle policies.
Working with JusDB on Database Backup Strategy
JusDB designs and implements backup strategies for MySQL and PostgreSQL deployments — including physical backups, WAL archiving, cloud storage integration, encryption, retention policies, and monthly restore drills. We make sure your backups actually work before you need them.
Explore JusDB PostgreSQL Services → | Talk to a DBA
Related reading: