A PostgreSQL database can be performing dozens of slow queries every hour without a single alert firing — because the slowness never crosses a hard error threshold, it just quietly degrades user experience. pgBadger turns the raw text inside your PostgreSQL log files into a structured HTML report that surfaces the slowest queries by total time, identifies lock contention, tracks checkpoint frequency, and flags autovacuum pressure, all from a single command. Unlike runtime sampling tools, it requires no database connection and no schema changes — just the logs you are already writing. This guide covers everything from initial postgresql.conf configuration through automated daily reporting.
- pgBadger parses PostgreSQL log files offline and generates richly detailed HTML (or JSON) reports covering slow queries, lock waits, connections, checkpoints, autovacuum, and temp files.
- The
log_line_prefixsetting inpostgresql.confmust match pgBadger's expected format exactly — an incorrect prefix will produce empty or misleading reports. - Use
log_min_duration_statement(notlog_statement = 'all') to log only slow queries; logging everything creates enormous files that cripple pgBadger's analysis speed. - Incremental mode (
-I) lets pgBadger append new log data to an existing report without reprocessing historical logs — essential for large or busy instances. - A nightly cron job can automate daily HTML report generation, self-linking the latest report, and purging reports older than 30 days.
What is pgBadger?
pgBadger is an open-source PostgreSQL log analyzer written in Perl. It reads the log files written by PostgreSQL's built-in logging subsystem and produces a comprehensive static HTML report — or JSON output for custom tooling. The project is maintained independently of the PostgreSQL project and supports log formats from PostgreSQL 8.0 through the current release.
The tool distinguishes itself from runtime analysis tools like pg_stat_statements in a critical way: it operates entirely offline. You can run pgBadger on a cold copy of last night's log file, on a compressed archive, or against logs exported from RDS — with no live database connection required. That makes it safe to run on production log files from a separate analysis host, and it makes it an excellent audit tool even for databases you no longer have access to.
pgBadger normalizes query text by replacing literal values with placeholders before grouping, so WHERE id = 1 and WHERE id = 99 are counted as the same query pattern. This is the same normalization strategy used by pg_stat_statements, and it makes it practical to find the ten query shapes responsible for 80% of your total database time.
| Tool | Data Source | Best For | Requires DB Connection |
|---|---|---|---|
| pgBadger | Log files (offline) | Historical analysis, auditing, reporting slow queries over a time window | No |
| pg_stat_statements | In-memory cumulative stats (live) | Real-time query performance tracking, finding top queries since last reset | Yes |
| EXPLAIN ANALYZE | Single query execution (live) | Deep inspection of a specific query's execution plan and actual row counts | Yes |
The three tools are complementary. Use pgBadger to identify which queries are slow over a historical window, pg_stat_statements to monitor query performance trends in real time, and EXPLAIN ANALYZE to drill into the execution plan of a specific problematic query once you have identified it.
Configuring PostgreSQL Logging for pgBadger
pgBadger's accuracy depends entirely on what PostgreSQL writes to its log files. An under-configured logging setup will produce reports that miss slow queries; an over-configured one will generate log files too large to analyze efficiently. The settings below represent the practical minimum for a useful pgBadger report.
postgresql.conf Settings
# Minimum required for pgBadger
log_min_duration_statement = 1000 # Log queries taking > 1 second
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0 # Log all temp file creation
log_autovacuum_min_duration = 250ms # Track slow autovacuums
log_error_verbosity = default
# pgBadger requires this exact prefix format
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 0
# For statement analysis
log_duration = off # Use log_min_duration_statement instead
log_statement = 'none' # Don't log ALL statements, only slow onesSetting
log_statement = 'all' will capture every SQL statement and create enormous log files that make pgBadger analysis impractically slow. Always use log_min_duration_statement to log only queries exceeding your SLA threshold. A busy OLTP database writing 1,000 queries per second will generate roughly 500 MB of log data per hour with log_statement = 'all' — and pgBadger will spend hours parsing it.
After editing postgresql.conf, reload the configuration without a restart:
SELECT pg_reload_conf();
-- Or from the OS:
pg_ctlcluster 16 main reloadlog_line_prefix Format
The log_line_prefix is the most critical setting for pgBadger compatibility. Every log line must begin with the prefix so pgBadger can parse the timestamp, process ID, user, database, application name, and client host from each entry. The format string above — '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' — is the recommended pgBadger format and enables the full feature set of the report including per-application and per-client breakdowns.
If your existing prefix uses a different format, pgBadger can often detect it automatically with --autodetect, but relying on auto-detection reduces report accuracy. It is better to align the prefix with pgBadger's expected format and reload the configuration.
For RDS PostgreSQL, enable pgBadger-compatible logging by setting
log_line_prefix = '%t:%r:%u@%d:[%p]:' and log_min_duration_statement = 1000 in your parameter group. RDS does not allow direct file system access, so download logs via aws rds download-db-log-file-portion --db-instance-identifier mydb --log-file-name error/postgresql.log.2026-03-05 --output text and then run pgBadger against the downloaded file.
Installing and Running pgBadger
Step 1: Install pgBadger
# Install via package manager
sudo apt-get install pgbadger # Debian/Ubuntu
sudo yum install pgbadger # RHEL/CentOS
# Or install from CPAN (latest version)
cpan App::pgBadger
# Verify
pgbadger --version # pgBadger version 12.xThe CPAN install is recommended when you need the latest version, which typically adds support for new PostgreSQL log formats and fixes edge cases in parser logic. The package manager version may lag a release or two behind.
Step 2: Run a Basic Analysis
# Basic single-file analysis
pgbadger /var/log/postgresql/postgresql-2026-03-05.log -o report.html
# Analyze multiple log files (e.g., past 7 days)
pgbadger /var/log/postgresql/postgresql-2026-02-*.log -o weekly_report.html
# Incremental mode (append new data to existing report)
pgbadger -I -O /var/pgbadger/out/ /var/log/postgresql/postgresql-*.log
# High-performance mode: parallel processing
pgbadger --jobs 4 /var/log/postgresql/postgresql-*.log -o report.html
# Filter to specific database
pgbadger --dbname myapp /var/log/postgresql/*.log -o myapp_report.html
# Top 20 slowest queries with normalized form
pgbadger --top 20 --normalized /var/log/postgresql/*.log -o slow_queries.html
# JSON output for custom processing
pgbadger --format json /var/log/postgresql/*.log -o stats.jsonStep 3: Open the Report
The generated HTML file is fully self-contained — all JavaScript and CSS is embedded inline. Copy report.html to any web server or open it directly in a browser. For team sharing, drop it into an S3 bucket with a static website endpoint or behind an nginx location block restricted by IP.
On a server with multiple CPU cores,
--jobs 4 (or higher) dramatically reduces analysis time by splitting the log file across parallel workers. On a 2 GB log file, --jobs 4 typically cuts runtime from 8–10 minutes to 2–3 minutes. Set jobs to the number of physical cores minus one to leave headroom for other processes.
Understanding pgBadger Reports
A full pgBadger report is organized into sections accessible via the left-hand navigation panel. Each section provides both tabular data and time-series graphs. Understanding what each section tells you — and what it does not — is essential to acting on the data correctly.
Queries Statistics
This is the primary section for slow query analysis. pgBadger presents two ranked lists: queries ordered by total duration (cumulative time across all executions during the log window) and queries ordered by maximum duration (the single slowest execution). A query that runs for 200 ms but executes 50,000 times per hour costs the database 10,000 seconds of total execution time and will dominate the total-duration list even if it never appears in an alerting threshold check. Optimizing high-total-duration queries typically produces the largest reduction in overall database load.
Lock Waits
When log_lock_waits = on, PostgreSQL logs any time a query waits longer than deadlock_timeout (default 1 second) to acquire a lock. pgBadger aggregates these into a ranked list by total wait time. Lock contention is frequently the root cause of application-layer latency spikes that appear intermittent and do not show up in average-duration metrics.
Connections
The connections section shows peak simultaneous connections by hour, average connection duration, and a breakdown of connections by database and user. Unexpected spikes in connection count — visible as sharp peaks in the hourly graph — often indicate connection pool misconfiguration or a code path that opens connections without pooling them.
Checkpoints
PostgreSQL logs checkpoint start, duration, and whether the checkpoint was triggered by time (scheduled) or by the WAL fill rate exceeding max_wal_size (requested). A high ratio of requested-to-scheduled checkpoints indicates that write throughput is exceeding the checkpoint cadence and that max_wal_size or checkpoint_completion_target may need tuning.
Autovacuum
With log_autovacuum_min_duration = 250ms, PostgreSQL logs every autovacuum and autoanalyze run that takes longer than 250 milliseconds. pgBadger aggregates these by table, showing which tables are being vacuumed most frequently and which individual runs are taking the longest. Tables appearing at the top of both lists are candidates for autovacuum cost-delay tuning or a manual VACUUM ANALYZE to reset bloat.
Temp Files
With log_temp_files = 0, PostgreSQL logs every time a query operation spills to disk because it exceeds work_mem. pgBadger shows the total temp file volume generated by each normalized query pattern. Queries with high temp file usage are candidates for a targeted SET work_mem = '...' in the session or connection pool, or for query rewrites that avoid sort and hash operations on large datasets.
Error and Fatal Events
The errors section aggregates log lines at ERROR, FATAL, and PANIC severity. Common entries include connection reset by client, out of memory, authentication failures, and too many connections. A rising trend in connection errors is often the first visible signal of connection pool exhaustion before applications start failing openly.
Automating pgBadger with Cron
Running pgBadger manually is useful for one-off investigations, but the real value comes from automated daily reports that let you track query performance trends over time. The following script is designed to run as a daily cron job, generate a dated report for the previous day's logs, maintain a latest.html symlink, and purge reports older than 30 days.
#!/bin/bash
# /etc/cron.daily/pgbadger-report
PGLOG_DIR="/var/log/postgresql"
REPORT_DIR="/var/www/html/pgbadger"
DATE=$(date +%Y-%m-%d)
YESTERDAY=$(date -d "yesterday" +%Y-%m-%d)
mkdir -p $REPORT_DIR
pgbadger \
--jobs 4 \
--format stderr \
--outfile "$REPORT_DIR/report-$YESTERDAY.html" \
"$PGLOG_DIR/postgresql-${YESTERDAY}_*.log"
# Symlink as latest
ln -sf "$REPORT_DIR/report-$YESTERDAY.html" "$REPORT_DIR/latest.html"
# Cleanup reports older than 30 days
find $REPORT_DIR -name "report-*.html" -mtime +30 -deleteMake the script executable and place it in /etc/cron.daily/:
chmod +x /etc/cron.daily/pgbadger-reportFor large or busy databases where log files grow to several gigabytes per day, incremental mode avoids reprocessing historical data on every run:
# First run: build history from all existing logs
pgbadger -I -O /var/pgbadger/data/ /var/log/postgresql/postgresql-*.log
# Subsequent runs: only process today's new log entries
pgbadger -I -O /var/pgbadger/data/ /var/log/postgresql/postgresql-$(date +%Y-%m-%d)*.log
# Rebuild the HTML report from the incremental binary data store
pgbadger -I -O /var/pgbadger/data/ --rebuild -o /var/www/html/pgbadger/report.htmlThe -I flag stores parsed data in a binary format in the output directory. Each subsequent run reads only the new portion of log files, making each incremental run proportional to the volume of new log data rather than the total historical volume. This is the recommended mode for instances writing more than 500 MB of logs per day.
Ensure your cron job runs after log rotation completes. If
log_rotation_age = 1d and your logs rotate at midnight, schedule the pgBadger cron job for 00:15 to ensure the previous day's log file is fully written and closed before analysis begins. A partially written log file will produce an incomplete report without any error — pgBadger will simply analyze whatever lines exist at the time of the run.
Key Takeaways
- Start with the right log_line_prefix. An incorrect or missing prefix is the single most common reason pgBadger reports are empty or inaccurate. Use the format in this guide and verify with a test run before relying on the output.
- Log only what matters. Set
log_min_duration_statementto your SLA threshold (e.g., 500 ms for an application with a 1-second response time budget). Do not uselog_statement = 'all'in production. - Sort by total duration, not max duration. The query that ran for 30 seconds once is less of a problem than the query that runs for 200 ms fifty thousand times a day. Total duration reveals the real cost drivers.
- Temp files are a memory pressure signal. High temp file volume from a specific query pattern means
work_memis too low for that operation. Raise it targeted — not globally — to avoid OOM risk. - Use incremental mode for busy databases. Once your daily log volume exceeds a few hundred megabytes, incremental mode (
-I) is the only practical way to maintain a continuously updated historical report. - pgBadger complements, not replaces,
pg_stat_statements. Use pgBadger for historical reporting and auditing; usepg_stat_statementsfor live monitoring and alerting; useEXPLAIN ANALYZEto fix individual query plans once you have identified the problem queries.
Working with JusDB on PostgreSQL Log Analysis
Configuring pgBadger, tuning log_min_duration_statement thresholds, and building the automation infrastructure to generate and distribute daily reports is straightforward on a single self-managed PostgreSQL instance. It becomes considerably more complex across a fleet of databases with different log formats, retention policies, and access controls — especially in mixed environments that include self-managed instances, RDS, Aurora, and AlloyDB alongside each other.
JusDB's PostgreSQL managed services include log analysis as a standard component of ongoing database management. Our team configures pgBadger-compatible logging on every managed instance, automates daily report generation, and reviews slow query reports as part of the regular performance review cycle. When a query pattern starts appearing at the top of the total-duration rankings, we investigate it proactively — before it becomes a user-facing issue.
If your team is spending engineering time on PostgreSQL performance debugging that could be redirected to product work, or if you are operating PostgreSQL at a scale where manual log review is no longer feasible, reach out to discuss how JusDB can take that operational burden off your plate.