Database Performance

pg_stat_statements: Query Performance Analysis in PostgreSQL

How to use pg_stat_statements to identify slow queries, analyze execution stats, and optimize PostgreSQL performance

JusDB Team
January 1, 2026
12 min read
233 views

Query performance regressions rarely announce themselves — they surface as P95 latency spikes at 2 AM, connection pool exhaustion under load, or a gradual degradation that nobody notices until the on-call alert fires. PostgreSQL ships with a first-class extension for catching these problems before they escalate: pg_stat_statements. It records execution statistics for every normalized query shape your database runs, giving you a persistent, aggregated view of where CPU cycles and I/O are actually being spent. Unlike EXPLAIN ANALYZE, which requires you to know which query to investigate first, pg_stat_statements tells you which queries are worth investigating at all.

TL;DR
  • Enable pg_stat_statements by adding it to shared_preload_libraries and running CREATE EXTENSION.
  • Focus on mean_exec_time, total_exec_time, calls, and rows to find slow and expensive queries.
  • Use pg_stat_statements_reset() after major schema or index changes to get clean baselines.
  • Combine with pg_stat_activity to correlate historical stats with currently running queries.
  • Watch autovacuum correlation — tables with bloat often show up as slow-query hotspots in pg_stat_statements.

What is pg_stat_statements?

pg_stat_statements is a contrib module that tracks planning and execution statistics for all SQL statements run against a PostgreSQL instance. It was first introduced in PostgreSQL 8.4 and has been refined through every major release since. The module hooks into the executor and planner, accumulating counters per normalized query — meaning that SELECT * FROM orders WHERE id = 1 and SELECT * FROM orders WHERE id = 99 are tracked as the same query shape.

Each row in the pg_stat_statements view represents a unique combination of database, user, and query fingerprint. The fingerprint is a hash of the normalized query text (with literal constants replaced by parameter placeholders). This normalization is what makes the view actionable at scale: instead of thousands of rows for every distinct literal value, you see one row per query pattern with aggregated statistics across all its executions.

The view is available to superusers by default. In PostgreSQL 14+, members of the pg_read_all_stats role can also query it, which is useful for granting read access to monitoring users without full superuser privileges.

Tip

In PostgreSQL 14 and later, pg_stat_statements gained a toplevel column that distinguishes whether a statement was called at the top level or from within a function or procedure. Filter on toplevel = true to avoid double-counting when your workload uses PL/pgSQL heavily.

How it Works

The extension stores its data in shared memory, in a fixed-size hash table controlled by the pg_stat_statements.max GUC (default: 5000 entries). When the table is full, the least-recently-used entry is evicted. This means that on a highly polyglot workload — many distinct query shapes — you may lose statistics for infrequently run but critical queries. Tuning pg_stat_statements.max is the first lever to pull if you notice gaps.

Each entry tracks:

  • calls — number of times the statement was executed
  • total_exec_time — cumulative execution time in milliseconds
  • mean_exec_time — average execution time per call (total_exec_time / calls)
  • min_exec_time / max_exec_time — best and worst single execution
  • stddev_exec_time — standard deviation of execution time (high stddev = inconsistent performance)
  • rows — total rows returned or affected across all executions
  • shared_blks_hit / shared_blks_read — buffer cache hits vs. physical reads
  • shared_blks_dirtied / shared_blks_written — blocks dirtied and written
  • temp_blks_read / temp_blks_written — temp file usage (signals sort/hash spills)
  • total_plan_time / mean_plan_time — planning overhead (available when pg_stat_statements.track_planning = on)

In PostgreSQL 17, the wal_bytes column was added, letting you correlate query patterns with WAL generation — critical for understanding replication lag contributors on write-heavy primaries.

Warning

Statistics in pg_stat_statements are cumulative from the last reset. A query that ran 10,000 times last month but only 5 times this week will still show a high total_exec_time. Always interpret totals relative to calls, and reset stats after major schema changes or load pattern shifts to get a clean baseline.

Configuration and Setup

Enabling the extension requires a server restart because it must be loaded at startup via shared_preload_libraries.

Step 1 — add the library to postgresql.conf or via ALTER SYSTEM:

bash
# In postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

# Or via ALTER SYSTEM (requires reload, but the actual load needs restart)
ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

Step 2 — restart PostgreSQL, then create the extension in each target database:

sql
-- Run in each database you want to monitor
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Verify it's active
SELECT * FROM pg_stat_statements LIMIT 1;

Step 3 — tune the relevant GUCs in postgresql.conf:

bash
pg_stat_statements.max = 10000          # default 5000; increase for polyglot workloads
pg_stat_statements.track = all          # 'top' (default) omits nested statements
pg_stat_statements.track_planning = on  # adds plan time columns (minor overhead)
pg_stat_statements.save = on            # persist stats across restarts
Important

Setting pg_stat_statements.track = all captures statements called from functions and procedures. This is essential if your application logic lives in PL/pgSQL, but it increases memory consumption and can cause LRU eviction of top-level query stats faster on busy systems. Start with top and switch to all only when debugging stored procedure performance.

Performance Tuning with pg_stat_statements

Finding Your Top-N Slowest Queries by Mean Execution Time

The most common starting point: which query patterns have the highest average latency?

sql
SELECT
  round(mean_exec_time::numeric, 2)    AS mean_ms,
  round(total_exec_time::numeric, 2)   AS total_ms,
  calls,
  round(rows::numeric / calls, 1)      AS avg_rows,
  left(query, 120)                     AS query_snippet
FROM pg_stat_statements
WHERE calls > 50                        -- ignore one-off queries
  AND query NOT ILIKE '%pg_stat%'       -- exclude monitoring queries
ORDER BY mean_exec_time DESC
LIMIT 10;

Example output from a production OLTP system:

sql
mean_ms  | total_ms   | calls  | avg_rows | query_snippet
----------+------------+--------+----------+--------------------------------------
  1842.31 |  9211550.0 |   5000 |      1.0 | SELECT * FROM orders WHERE customer_id = $1 AND status = $2
   312.88 |  9073520.0 |  29000 |     24.3 | SELECT p.*, i.quantity FROM products p JOIN inventory i ...
    89.14 |  8914000.0 | 100000 |      1.0 | INSERT INTO audit_log (event, payload) VALUES ($1, $2)

The first query has a 1.8-second mean — almost certainly missing an index on (customer_id, status). The third query runs 100,000 times but averages only 89ms; the aggregate cost is high purely due to volume.

Finding Queries with the Highest Total Cost

Mean latency tells you which queries are slow per call; total execution time tells you which queries are consuming the most database resources overall:

sql
SELECT
  round(total_exec_time::numeric / 1000, 1) AS total_sec,
  calls,
  round(mean_exec_time::numeric, 2)          AS mean_ms,
  round(
    100.0 * total_exec_time
    / sum(total_exec_time) OVER (), 2
  )                                          AS pct_total_time,
  left(query, 120)                           AS query_snippet
FROM pg_stat_statements
WHERE calls > 10
ORDER BY total_exec_time DESC
LIMIT 10;

Identifying I/O-Heavy Queries

Buffer cache misses drive disk I/O. Queries with a low cache hit ratio relative to their shared_blks_read are candidates for index tuning or shared_buffers expansion:

sql
SELECT
  calls,
  shared_blks_hit,
  shared_blks_read,
  round(
    100.0 * shared_blks_hit
    / nullif(shared_blks_hit + shared_blks_read, 0), 2
  )                    AS cache_hit_pct,
  left(query, 120)     AS query_snippet
FROM pg_stat_statements
WHERE shared_blks_read > 1000
ORDER BY shared_blks_read DESC
LIMIT 10;

Finding Queries That Spill to Disk

Non-zero temp_blks_written means sorts or hash joins exceeded work_mem and wrote temporary files — a significant performance penalty:

sql
SELECT
  calls,
  temp_blks_written,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  left(query, 120)                  AS query_snippet
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 10;
Tip

If a specific query consistently spills to disk, try increasing work_mem at the session level before committing to a global change: SET work_mem = '64MB'; EXPLAIN ANALYZE .... A global increase of work_mem multiplies by the number of concurrent queries, so a 64 MB setting with 200 connections can theoretically consume 12 GB of RAM for sort/hash operations alone.

Correlating with pg_stat_activity

To find currently running queries that match historically expensive patterns, join pg_stat_statements with pg_stat_activity on queryid:

sql
SELECT
  a.pid,
  a.state,
  a.wait_event_type,
  a.wait_event,
  round(
    extract(epoch FROM (now() - a.query_start))::numeric, 1
  )                                         AS running_sec,
  round(s.mean_exec_time::numeric, 2)       AS historical_mean_ms,
  s.calls                                   AS historical_calls,
  left(a.query, 100)                        AS query_snippet
FROM pg_stat_activity a
JOIN pg_stat_statements s
  ON a.query_id = s.queryid          -- pg_stat_activity.query_id available in PG 14+
WHERE a.state = 'active'
  AND a.query_start < now() - interval '5 seconds'
ORDER BY running_sec DESC;
Tip

The query_id column on pg_stat_activity was added in PostgreSQL 14. On older versions, you can still correlate by normalizing the query text manually, but the join is not reliable — another reason to prioritize upgrading to PostgreSQL 14+.

Autovacuum Correlation

Bloated tables are a frequent root cause of slow sequential scans that surface in pg_stat_statements. Cross-referencing with pg_stat_user_tables helps confirm whether autovacuum is keeping up:

sql
-- Tables with high dead tuple counts that may be causing slow queries
SELECT
  schemaname,
  relname,
  n_dead_tup,
  n_live_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC
LIMIT 20;

If a table from this output appears in your top slow queries in pg_stat_statements, trigger a manual VACUUM ANALYZE and observe whether mean_exec_time drops after resetting stats:

sql
-- Reset stats to get a clean post-vacuum baseline
SELECT pg_stat_statements_reset();

-- Or reset stats for a specific database only (PG 12+)
SELECT pg_stat_statements_reset(0, oid, 0)
FROM pg_database
WHERE datname = 'your_database_name';

Detecting High-Frequency, Low-Latency Query Storms

Some of the worst production problems are caused not by slow queries but by extremely frequent fast ones — N+1 patterns, cache-bypass bugs, or misconfigured connection pools hammering the database:

sql
SELECT
  calls,
  round(mean_exec_time::numeric, 3) AS mean_ms,
  round(total_exec_time::numeric / 1000, 1) AS total_sec,
  left(query, 120) AS query_snippet
FROM pg_stat_statements
WHERE mean_exec_time < 5            -- fast queries only
ORDER BY calls DESC
LIMIT 10;

A query averaging 1.2ms but called 2,000,000 times in a single day is contributing 2,400 seconds of cumulative CPU time. These are candidates for application-level caching or query batching.

Warning

Calling pg_stat_statements_reset() without arguments clears all statistics for all databases on the instance. This is destructive and irreversible. In production, prefer the per-database or per-query reset functions available in PostgreSQL 12+ to preserve stats for databases you are not actively tuning.

Key Takeaways
  • Enable pg_stat_statements via shared_preload_libraries and CREATE EXTENSION; it requires a server restart but has negligible runtime overhead.
  • Tune pg_stat_statements.max (default 5000) upward on polyglot workloads to prevent LRU eviction of critical query statistics.
  • mean_exec_time finds slow queries per call; total_exec_time finds the highest aggregate resource consumers — use both.
  • Non-zero temp_blks_written signals work_mem spills; low cache_hit_pct on shared_blks_hit signals index or buffer cache gaps.
  • Reset stats with pg_stat_statements_reset() after index additions, vacuums, or schema changes to establish clean before/after baselines.
  • In PostgreSQL 14+, join pg_stat_activity.query_id to pg_stat_statements.queryid to correlate historical aggregates with currently running sessions.
  • High dead tuple counts in pg_stat_user_tables often explain slow sequential scans that appear as top entries in pg_stat_statements.
  • High-call, low-latency queries can collectively dominate CPU time — do not ignore them just because individual executions are fast.

Working with JusDB on PostgreSQL Query Performance

pg_stat_statements is the best diagnostic starting point PostgreSQL gives you, but translating raw statistics into an optimization plan — choosing between index tuning, partitioning, connection pooling, or schema redesign — requires experience with the full stack. At JusDB, we work with engineering teams to build systematic query performance programs: baselining with pg_stat_statements, correlating with application-level traces, validating changes under production load, and instrumenting alerting so regressions are caught before they page someone at 2 AM.

If you are dealing with unexplained latency spikes, ballooning total_exec_time on queries you cannot easily rewrite, or need a second opinion on a PostgreSQL tuning plan, reach out to JusDB. We offer PostgreSQL performance reviews, hands-on optimization engagements, and ongoing database advisory for teams running PostgreSQL at scale.

Share this article