Database SRE

PostgreSQL 17 Incremental Backup: pg_basebackup and pg_combinebackup Explained

PostgreSQL 17 finally adds native incremental backup support — allowing subsequent backups to capture only changed blocks since the last full backup.

JusDB Team
February 9, 2024
9 min read
163 views

If you manage a 2TB PostgreSQL database, you know the pain: daily full backups that chew through a 4-hour window, strain your storage budget, and leave a narrow margin for anything to go wrong before your backup overlaps with your next scheduled run. PostgreSQL has long relied on third-party tools like pgBackRest and Barman to handle incremental backups, leaving teams to bolt on complexity that the database core simply didn't support natively. PostgreSQL 17 finally changes that. With native incremental backup support built directly into pg_basebackup, you can now capture only the blocks that changed since your last backup — dramatically shrinking backup windows and storage costs without reaching for external tooling. This guide walks through everything you need to know to put it into production.

TL;DR: PostgreSQL 17 adds native incremental backup via pg_basebackup --incremental. Enable summarize_wal = on in postgresql.conf, take a full backup first, then take incremental backups referencing the previous backup's manifest. Use pg_combinebackup to merge a full + incrementals into a single restoreable backup directory. Combine with WAL archiving for full PITR capability.

What's New in PostgreSQL 17 Backup?

Prior to PostgreSQL 17, pg_basebackup only supported full backups — every run copied every byte of every data file, regardless of what had changed since the last backup. Incremental backup was the domain of tools like pgBackRest, which tracked block-level changes using methods outside the core engine.

PostgreSQL 17 introduces two complementary features that together enable native incremental backup:

  • WAL summarization (summarize_wal): A background process that reads WAL records and builds compact summary files tracking which blocks in which data files were modified during each WAL range.
  • pg_combinebackup: A new standalone utility that takes a full backup plus one or more incremental backups and combines them into a single synthetic full backup ready for restore.

The incremental backup support in pg_basebackup is a server-side feature — the server generates WAL summaries, the backup client reads them to determine what to copy, and pg_combinebackup assembles the pieces offline. This means you get proper incremental backups without shipping a backup agent or installing a separate backup tool.

Version requirement: WAL summarization and incremental backup require PostgreSQL 17 or later on both the server and the pg_basebackup/pg_combinebackup binaries. There is no backport to PostgreSQL 14, 15, or 16. If you are on an older version, pgBackRest remains your best option for incremental backup.

How Incremental Backup Works

WAL Summary Tracking

When summarize_wal = on is set, PostgreSQL runs a WAL summarizer background process. This process reads WAL records as they are generated and writes summary files to $PGDATA/pg_wal/summaries/. Each summary file covers a range of WAL (from LSN X to LSN Y) and records, for each data file modified during that range, the exact set of blocks that were written.

Summary files are small — a heavily-written database generating gigabytes of WAL per day might produce only a few megabytes of summary files. They are also cumulative: if block 42 of a relation is written 1,000 times during a WAL range, the summary records that block 42 was modified, not that it was modified 1,000 times.

Changed Block Detection

When you run pg_basebackup --incremental, you supply the manifest from your previous backup. The manifest records the LSN at which that backup ended. pg_basebackup asks the server: "which blocks changed between that LSN and now?" The server consults the WAL summaries and returns the list. The backup client then copies only those blocks, plus all WAL generated since the last backup, plus a new manifest for the incremental backup itself.

The result: an incremental backup contains only the data file blocks that actually changed, not full copies of unchanged relations. A database where 5% of blocks change daily produces an incremental backup roughly 5% the size of a full backup.

Setting Up the First Full Backup with pg_basebackup

Enable WAL Summarization

Before taking any backup, enable WAL summarization in postgresql.conf:

bash
summarize_wal = on

Apply the change:

bash
psql -U postgres -c "ALTER SYSTEM SET summarize_wal = on;"
psql -U postgres -c "SELECT pg_reload_conf();"

Verify the summarizer is running:

bash
psql -U postgres -c "SELECT * FROM pg_stat_activity WHERE backend_type = 'walsummarizer';"

You also need WAL archiving or a replication slot to ensure WAL is retained long enough to cover the gap between your full backup and each incremental. Set wal_level = replica at minimum:

bash
wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

Take the Full Backup

Take an initial full backup with --checkpoint=fast and --manifest-checksums=sha256 to produce a manifest suitable for subsequent incrementals:

bash
pg_basebackup \
  --host=localhost \
  --port=5432 \
  --username=replicator \
  --pgdata=/mnt/backups/full-2025-01-20 \
  --format=plain \
  --wal-method=stream \
  --checkpoint=fast \
  --manifest-checksums=sha256 \
  --progress \
  --verbose

After completion, the backup directory contains a backup_manifest file. This file is the key input for all subsequent incremental backups. Store it alongside the backup and never modify it.

bash
ls /mnt/backups/full-2025-01-20/backup_manifest
Important: The --format=tar option is not supported for incremental backups — you must use --format=plain. Additionally, --wal-method=fetch is not supported; use --wal-method=stream or --wal-method=none combined with WAL archiving.

Taking Incremental Backups

First Incremental

Reference the full backup's manifest using the --incremental flag:

bash
pg_basebackup \
  --host=localhost \
  --port=5432 \
  --username=replicator \
  --pgdata=/mnt/backups/incr-2025-01-21 \
  --incremental=/mnt/backups/full-2025-01-20/backup_manifest \
  --format=plain \
  --wal-method=stream \
  --checkpoint=fast \
  --manifest-checksums=sha256 \
  --progress \
  --verbose

The incremental backup directory has the same structure as a full backup but data files contain only the changed blocks. The backup size on a lightly-modified database may be a fraction of the full backup.

Chaining Incrementals

Each incremental backup produces its own backup_manifest. To chain incrementals, reference the previous incremental's manifest:

bash
# Second incremental — references the first incremental's manifest
pg_basebackup \
  --host=localhost \
  --port=5432 \
  --username=replicator \
  --pgdata=/mnt/backups/incr-2025-01-22 \
  --incremental=/mnt/backups/incr-2025-01-21/backup_manifest \
  --format=plain \
  --wal-method=stream \
  --checkpoint=fast \
  --manifest-checksums=sha256 \
  --progress \
  --verbose

You can chain as many incrementals as needed. Each one only captures blocks that changed since the previous backup in the chain.

WAL summary retention: PostgreSQL automatically removes WAL summary files that are no longer needed. If WAL summaries for the period between your last backup and now are missing (because they were removed or the summarizer was off), pg_basebackup --incremental will fall back to a full backup for the affected files. Monitor pg_wal/summaries/ and ensure the summarizer stays running between backups.

Restoring with pg_combinebackup

How pg_combinebackup Works

pg_combinebackup takes your full backup plus all incrementals in chronological order and produces a single synthetic full backup directory. You then restore from that synthetic backup exactly as you would from a plain full backup.

Combining a Full Backup with Two Incrementals

bash
pg_combinebackup \
  /mnt/backups/full-2025-01-20 \
  /mnt/backups/incr-2025-01-21 \
  /mnt/backups/incr-2025-01-22 \
  --output=/mnt/restore/combined-2025-01-22

The order matters: full backup first, then each incremental in the sequence they were taken. pg_combinebackup verifies the manifest chain as it goes — if you supply them out of order or skip one, it will error out.

Completing the Restore

After combining, the output directory is a valid PostgreSQL data directory. Copy it into place and apply WAL to reach your target recovery point:

bash
# Stop PostgreSQL if running
pg_ctl stop -D /var/lib/postgresql/17/main

# Copy combined backup into place
rsync -av --delete /mnt/restore/combined-2025-01-22/ /var/lib/postgresql/17/main/

# Configure recovery target (if using PITR)
cat > /var/lib/postgresql/17/main/postgresql.conf <
Combine with WAL archiving for full PITR: Incremental backup captures changed blocks up to the moment of backup. To recover to any point between backups, you need WAL. Configure archive_command to ship WAL to a durable location (S3, NFS, or a standby). With archived WAL plus your combined backup, you can recover to any second within your WAL retention window — not just to the moment of your last incremental backup.

Backup Scheduling Strategy

Weekly Full, Daily Incremental

A practical production schedule for a 2TB database:

  • Sunday 01:00: Full backup (4-hour window, run it overnight)
  • Monday–Saturday 02:00: Incremental backup (30–60 minutes depending on write volume)

Automate with cron. Create a wrapper script that tracks the manifest chain:

bash
#!/usr/bin/env bash
# /usr/local/bin/pg-incremental-backup.sh

set -euo pipefail

BACKUP_BASE="/mnt/backups"
HOST="localhost"
PORT="5432"
USER="replicator"
DATE=$(date +%Y-%m-%d)
DAY_OF_WEEK=$(date +%u)  # 1=Monday, 7=Sunday

if [ "$DAY_OF_WEEK" -eq 7 ]; then
  # Sunday: full backup
  BACKUP_DIR="$BACKUP_BASE/full-$DATE"
  pg_basebackup \
    --host="$HOST" --port="$PORT" --username="$USER" \
    --pgdata="$BACKUP_DIR" \
    --format=plain --wal-method=stream \
    --checkpoint=fast --manifest-checksums=sha256 \
    --progress
  # Record manifest path for this week's chain
  echo "$BACKUP_DIR/backup_manifest" > "$BACKUP_BASE/latest_manifest"
else
  # Mon–Sat: incremental backup
  PREV_MANIFEST=$(cat "$BACKUP_BASE/latest_manifest")
  BACKUP_DIR="$BACKUP_BASE/incr-$DATE"
  pg_basebackup \
    --host="$HOST" --port="$PORT" --username="$USER" \
    --pgdata="$BACKUP_DIR" \
    --incremental="$PREV_MANIFEST" \
    --format=plain --wal-method=stream \
    --checkpoint=fast --manifest-checksums=sha256 \
    --progress
  # Update manifest pointer to latest incremental
  echo "$BACKUP_DIR/backup_manifest" > "$BACKUP_BASE/latest_manifest"
fi

echo "Backup completed: $BACKUP_DIR"
bash
# Add to cron
0 2 * * * /usr/local/bin/pg-incremental-backup.sh >> /var/log/pg-backup.log 2>&1

Retention and Cleanup

Keep at least two full backup chains before deleting old backups. Deleting a full backup invalidates all incrementals in its chain — you cannot restore from incrementals alone without the full backup they chain from.

bash
# Remove backups older than 14 days
find /mnt/backups -maxdepth 1 -type d -mtime +14 -exec rm -rf {} \;

How It Compares to pgBackRest's Incremental Mode

pgBackRest has offered incremental and differential backups for years, with a mature feature set that native PostgreSQL tooling is still catching up to. Here's an honest comparison:

Feature pg_basebackup + pg_combinebackup (PG17) pgBackRest
Incremental backup Yes (block-level, PG17+) Yes (block-level, any version)
Differential backup No (chain from full only) Yes
Backup compression --compress flag (lz4, zstd, gzip) zstd, lz4, gz, bz2
S3/GCS/Azure storage No (manual upload required) Native support
Parallel backup/restore --jobs flag for backup Yes, both directions
Backup verification Manifest checksums Full verify command
Standby backup Yes Yes
External dependencies None (built-in) Separate installation
Minimum PostgreSQL version 17 9.4+

For teams on PostgreSQL 17 who want to reduce operational dependencies and don't need cloud object storage integration, native incremental backup is a compelling option. For teams with complex multi-server environments, existing pgBackRest infrastructure, or cloud storage requirements, pgBackRest remains the more capable solution — and the two are not mutually exclusive if you want to start with native tooling and layer in pgBackRest later.

Key Takeaways
  • PostgreSQL 17 adds native incremental backup via pg_basebackup --incremental, backed by WAL summarization (summarize_wal = on).
  • Incremental backups capture only changed blocks since the previous backup in the chain, dramatically reducing backup windows and storage for write-heavy databases.
  • Use pg_combinebackup to merge a full backup and all its incrementals into a single restoreable directory before recovery — you cannot restore from an incremental alone.
  • Always use --format=plain and --wal-method=stream for incremental backups; tar format and fetch WAL method are not supported.
  • Combine incremental backups with WAL archiving to enable point-in-time recovery to any moment between backups, not just to backup timestamps.
  • The manifest chain is sacred — never delete an intermediate incremental or the full backup it chains from while dependent incrementals exist.
  • pgBackRest remains stronger for cloud storage integration, differential backups, and pre-PG17 versions, but native tooling is now a viable zero-dependency option for PG17+ shops.

Working with JusDB on PostgreSQL Backup and Recovery

JusDB manages PostgreSQL backup strategies for engineering teams — incremental backups with PostgreSQL 17, WAL archiving to S3, pgBackRest for enterprise features, and automated restore testing to verify your recovery procedures actually work.

Explore JusDB PostgreSQL Management →  |  Talk to a DBA

Related reading:

Share this article