Database Performance

auto_explain: Automatic Query Plan Logging for Slow Queries in PostgreSQL

Configure auto_explain to automatically log EXPLAIN output for slow PostgreSQL queries. Covers log_analyze, log_buffers, sample_rate, JSON format, and RDS setup.

JusDB Team
March 5, 2026
9 min read
235 views

Slow query logs tell you which queries are slow — but not why. When a query that ran in 50ms suddenly takes 4 seconds, you need the actual execution plan from that specific moment, not an average aggregated over thousands of runs. auto_explain fills exactly that gap: it automatically captures and logs the full EXPLAIN output for any query that exceeds your defined threshold. No manual intervention, no reproducing the condition in a staging environment — just the plan, right there in your PostgreSQL logs.

TL;DR
  • auto_explain is a PostgreSQL contrib module that automatically logs query execution plans for slow queries without manual EXPLAIN calls.
  • Enable globally via shared_preload_libraries (requires restart) or per-session with LOAD 'auto_explain' (no restart needed).
  • Set auto_explain.log_analyze = on to capture actual row counts and timing per plan node — at the cost of 10–30% execution overhead for logged queries.
  • On PostgreSQL 13+, use auto_explain.sample_rate to log only a fraction of qualifying queries and reduce log volume on high-traffic databases.
  • Pair with pg_stat_statements (aggregate trends) and pgBadger (log reports) for a complete slow query diagnostic workflow.

What is auto_explain?

auto_explain is a contrib module bundled with PostgreSQL that hooks into the executor and logs the EXPLAIN output for any statement whose execution time exceeds a configured threshold. It ships with every standard PostgreSQL installation — you do not need to compile anything or install an external package. A simple LOAD command or a line in postgresql.conf activates it.

The key distinction from every other slow-query tool is that auto_explain captures the actual plan for each individual execution. pg_stat_statements tells you that query X has run 10,000 times with an average latency of 800ms. That is useful. But when query X starts spiking to 5,000ms on Monday mornings, you need to know what plan the planner chose during those specific executions. Did it switch from an index scan to a sequential scan? Did a nested loop replace a hash join? auto_explain answers those questions in real time, without you having to reproduce the workload.

This makes auto_explain indispensable for two categories of problems:

  • Plan regressions: A statistics refresh, a new autovacuum run, or a PostgreSQL minor upgrade causes the planner to choose a different — worse — plan. The query was fast yesterday; it is slow today. The plan that was chosen during those slow executions is exactly what you need to see.
  • Parameter-sensitive plans: Queries with highly skewed data distributions can get radically different plans depending on the parameter value bound at execution time. auto_explain logs the plan alongside the query text, so you can correlate specific parameter values with bad plan choices.

Enabling and Configuring auto_explain

Method 1 — Global load via shared_preload_libraries (requires restart)

For persistent, database-wide coverage, add auto_explain to shared_preload_libraries in postgresql.conf. This loads the module at server startup and applies it to every session automatically.

ini
# postgresql.conf — requires PostgreSQL service restart
shared_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = 1000   # Log queries taking > 1 second
auto_explain.log_analyze     = on      # Include actual row counts (costs CPU)
auto_explain.log_buffers     = on      # Include buffer hit/miss stats
auto_explain.log_timing      = on      # Include per-node timing
auto_explain.log_triggers    = off     # Include trigger execution stats
auto_explain.log_verbose     = off     # Keep off in production (too verbose)
auto_explain.log_format      = text    # or json, xml, yaml
auto_explain.log_nested_statements = off

After editing postgresql.conf, restart PostgreSQL:

bash
# systemd-based systems
sudo systemctl restart postgresql

# Verify the module loaded
psql -c "SHOW shared_preload_libraries;"

Method 2 — Per-session load (no restart needed)

When you are debugging a specific issue and cannot restart the server, load auto_explain for your session only. This is ideal for targeted investigations in production or staging without touching the server-wide configuration.

sql
-- Load for this session only — no restart required
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 500;
SET auto_explain.log_analyze = ON;

-- Run the slow query — plan will be logged to PostgreSQL log
SELECT * FROM orders WHERE customer_id = 12345 ORDER BY created_at DESC LIMIT 100;

The plan appears in the PostgreSQL server log (typically /var/log/postgresql/ or wherever log_directory points). The session ends, and the module is unloaded — no persistent change to the server.

Method 3 — RDS PostgreSQL

On Amazon RDS for PostgreSQL, shared_preload_libraries is managed through Parameter Groups and the available libraries are restricted. Use the per-session method instead:

sql
-- On RDS PostgreSQL — per-session load is the practical approach
LOAD 'auto_explain';
SET auto_explain.log_min_duration = '1s';
SET auto_explain.log_analyze = true;

-- Slow query plans will appear in RDS Enhanced Monitoring logs
-- or in the PostgreSQL error log exported to CloudWatch

Key Configuration Parameters

Parameter Default Recommended (Production) Description
log_min_duration -1 (disabled) 1000ms Threshold in milliseconds; queries longer than this get their plan logged
log_analyze off on Includes actual row counts and timing per plan node (adds 10–30% overhead)
log_buffers off on Shows shared/local buffer hits, reads, writes per node
log_timing on on Per-node wall clock timing (requires log_analyze)
log_triggers off off Include trigger execution statistics in the plan output
log_format text text Plan output format: text, json, xml, or yaml
log_nested_statements off off Logs plans for queries executed inside PL/pgSQL functions
sample_rate 1.0 0.1–1.0 Fraction of qualifying queries to log (PostgreSQL 13+); reduces log volume
Performance overhead of log_analyze
auto_explain.log_analyze = on executes the query and measures actual rows and timing at each plan node. This adds 10–30% execution overhead for the logged queries. On high-QPS databases, use auto_explain.sample_rate = 0.01 (1%) or log_analyze = off to minimize impact.

Using sample_rate on high-traffic databases (PostgreSQL 13+)

ini
# Log only 10% of qualifying slow queries — reduces log volume by 90%
auto_explain.sample_rate      = 0.1
auto_explain.log_min_duration = 500   # 500ms threshold
auto_explain.log_analyze      = on
auto_explain.log_buffers      = on

The sample_rate parameter is applied probabilistically per query execution. At 0.1, roughly 1 in 10 slow queries will have its plan logged. This is a practical compromise between diagnostic coverage and log storage costs.

Tip: Layer your slow query thresholds
Set auto_explain.log_min_duration 2–3x higher than your log_min_duration_statement so that auto_explain only captures plans for the very worst offenders, while pgBadger or pg_stat_statements handles the broader slow query landscape. For example: log_min_duration_statement = 500ms and auto_explain.log_min_duration = 1500ms.

Reading auto_explain Output in Logs

When auto_explain fires, the log entry contains the query text and the full plan tree. Here is a realistic example showing a problematic query that performs two sequential scans and a hash join on large tables:

bash
2026-03-05 10:42:17 UTC [12345] LOG:  duration: 2341.827 ms  plan:
Query Text: SELECT o.*, u.email FROM orders o JOIN users u ON u.id = o.user_id WHERE o.status = 'pending' AND o.created_at > '2026-02-03'
Hash Join  (cost=1847.53..45231.76 rows=12847 width=156) (actual time=89.432..2341.123 rows=12847 loops=1)
  Hash Cond: (o.user_id = u.id)
  Buffers: shared hit=2847 read=38923
  ->  Seq Scan on orders o  (cost=0.00..38234.00 rows=12847 width=89) (actual time=0.023..1823.456 rows=12847 loops=1)
        Filter: ((status = 'pending') AND (created_at > '2026-02-03'))
        Rows Removed by Filter: 487153
        Buffers: shared hit=847 read=38076
  ->  Hash  (cost=1234.00..1234.00 rows=50000 width=67) (actual time=87.123..87.123 rows=50000 loops=1)
        Buckets: 65536  Batches: 1  Memory Usage: 4096kB
        Buffers: shared hit=2000 read=847
        ->  Seq Scan on users u  (cost=0.00..1234.00 rows=50000 width=67) (actual time=0.018..45.234 rows=50000 loops=1)

Reading this plan reveals several actionable signals immediately:

  • Seq Scan on orders with 487,153 rows removed by filter: The planner is reading the entire table and discarding 97% of rows. A composite index on (status, created_at) would eliminate this.
  • 38,923 shared buffer reads on orders: Nearly all data is coming from disk (or OS page cache miss). The table is too large to fit in shared_buffers for this query.
  • 2,341ms total duration with 1,823ms spent in the orders Seq Scan: 78% of the query time is in that one scan, confirming the index as the fix to pursue.

Using JSON format for structured log parsing

For environments that ship PostgreSQL logs into ELK, Datadog, or a custom log aggregator, the JSON plan format makes machine parsing straightforward:

ini
# Use JSON for log parsing / ingestion into ELK / Datadog
auto_explain.log_format = json
bash
# Extract slow query plans from PostgreSQL log (text post-processing example)
grep -A 50 "auto_explain" /var/log/postgresql/postgresql.log | \
  python3 -c "
import sys, json
for line in sys.stdin:
    if 'plan:' in line:
        # Extract and pretty-print JSON plan
        pass
"

Debugging PL/pgSQL stored procedures

ini
# Enable to see plans for queries inside stored procedures
# Use only in dev/staging — log volume will be very high with log_min_duration = 0
auto_explain.log_nested_statements = on
auto_explain.log_min_duration      = 0

With log_nested_statements = on, every SQL statement executed inside a PL/pgSQL function, trigger, or DO block that exceeds the threshold will have its plan logged. This is invaluable when you have a stored procedure that runs fine on small data but degrades on production-scale inputs.

auto_explain vs pg_stat_statements vs pgBadger — When to Use Each

These three tools address overlapping but distinct questions. Using them together gives a complete picture of query performance:

Tool Captures Actual plan? Plan per execution? Best for
auto_explain Full query plan with actual stats Yes (with log_analyze) Yes Plan regression diagnosis, parameter-sensitive plans
pg_stat_statements Aggregated execution statistics No No Finding the top-N slowest query shapes overall
pgBadger Log file analysis and reporting No No Historical trends, per-hour reports, lock waits summary
EXPLAIN ANALYZE Single query plan on demand Yes Yes Interactive debugging when you can reproduce the query

A typical workflow: pg_stat_statements identifies that SELECT ... FROM orders WHERE status = $1 is responsible for 40% of total query time this week. pgBadger confirms the spikes happen between 9am and 11am UTC. You then enable auto_explain with a threshold targeting those executions and capture the actual plan during the next spike window. Now you have the plan — and you can fix it.

Production Tuning and Log Management

Running auto_explain in production requires balancing diagnostic value against log volume and execution overhead. The following configuration represents a reasonable starting point for most workloads above 500 QPS:

ini
# Production-balanced auto_explain configuration
shared_preload_libraries = 'pg_stat_statements,auto_explain'

# Only log the worst 1% of slow queries to limit I/O
auto_explain.log_min_duration = 2000    # 2 seconds — well above p99 baseline
auto_explain.log_analyze      = on
auto_explain.log_buffers      = on
auto_explain.log_timing       = on
auto_explain.log_verbose      = off     # Never enable in production
auto_explain.log_format       = text
auto_explain.sample_rate      = 0.25   # Log 25% of qualifying queries
auto_explain.log_nested_statements = off

Key decisions to revisit over time:

  • Threshold tuning: Start at 2–3x your observed p99 latency. If your p99 is 300ms, set the threshold at 800ms–1000ms. Adjust down only after confirming log volume is manageable.
  • Log rotation: Plan lines can be hundreds of lines long. A single hour of activity with log_min_duration = 100 can produce gigabytes of logs. Ensure logrotate is configured with appropriate size and frequency limits.
  • log_analyze in practice: The overhead is real but bounded — it only applies to queries that are already slow (above the threshold). A 20% overhead on a 2-second query is 400ms extra. For the diagnostic value, this is usually acceptable when sample_rate is kept low.
  • Pairing with pg_stat_statements: Both modules can coexist in shared_preload_libraries. They complement each other without interference.

Key Takeaways

  • auto_explain captures what no other tool does: the actual query plan for a specific slow execution, logged automatically without manual intervention.
  • Load per-session with LOAD 'auto_explain' when you need to debug without restarting PostgreSQL — critical for production investigations on RDS or live systems.
  • Always pair log_analyze = on with a meaningful threshold: the overhead is proportional to logged queries, not all queries, so a high threshold keeps cost manageable.
  • Use sample_rate on PostgreSQL 13+ to cap log volume on high-QPS databases without sacrificing visibility into plan regressions.
  • auto_explain, pg_stat_statements, and pgBadger are complementary: use pg_stat_statements and pgBadger to identify problem queries, then auto_explain to diagnose the plan at execution time.
  • For JSON log pipelines, set auto_explain.log_format = json to make plan ingestion into ELK, Datadog, or custom tooling straightforward.

Working with JusDB on PostgreSQL Query Analysis

Configuring auto_explain is straightforward. Interpreting the plan output at scale — correlating plan changes with deployment events, statistics refreshes, or workload shifts — is where deep PostgreSQL expertise pays off. At JusDB, our PostgreSQL engagements routinely involve exactly this kind of plan regression analysis: establishing performance baselines, enabling targeted diagnostic instrumentation, and translating raw plan output into concrete index, schema, or configuration changes.

If you are dealing with intermittent slow queries that are hard to reproduce, parameter-sensitive plan instability, or a PostgreSQL upgrade that changed planner behavior, we can help you work through it systematically. Get in touch to discuss your PostgreSQL environment.

Share this article

JusDB Team

Official JusDB content team