Database Performance

PostgreSQL Wait Events: Understanding What Slows Your Database

Decode PostgreSQL wait events to diagnose lock contention, I/O bottlenecks, and CPU pressure in production

JusDB Team
January 2, 2026
11 min read
198 views

A PostgreSQL query that ran in 12ms yesterday is now taking 4 seconds. No code changed, no schema changed, but something is clearly wrong. Wait events are PostgreSQL's built-in mechanism for exposing exactly what each backend process is waiting for at any given moment — lock acquisition, disk I/O, WAL flushing, network round-trips. Reading them correctly is the difference between 30 minutes of guesswork and a precise, five-minute diagnosis. This post covers the wait event categories, the queries to surface them, and real-world examples of how to act on what you find.

TL;DR
  • Wait events in pg_stat_activity show what each backend is blocked on right now — lock contention, I/O, WAL, CPU, or network
  • The five major categories are Lock, LWLock, IO, Client, and Timeout
  • Query wait_event_type and wait_event together; neither column alone tells the full story
  • Join pg_stat_activity with pg_locks to trace the full blocker chain
  • Aggregate wait counts over time with pg_wait_sampling or a simple polling loop to find chronic bottlenecks, not just instantaneous ones

What Are PostgreSQL Wait Events?

Every PostgreSQL backend process spends its time either executing work on the CPU or waiting for something external to complete. When a process is waiting, PostgreSQL records the category and specific name of the wait in two columns of pg_stat_activity: wait_event_type and wait_event. When a process is actively running, both columns are NULL.

Wait events were introduced in PostgreSQL 9.6 and have been expanded significantly in each major release since. As of PostgreSQL 16, there are over 200 distinct named wait events grouped into roughly a dozen categories. For most production tuning work, you will focus on five: Lock, LWLock, IO, Client, and Timeout.

The critical insight is that a wait event tells you what the process needs but cannot yet have. A backend showing wait_event_type = 'Lock' and wait_event = 'relation' is not slow because of bad SQL — it is blocked waiting for another transaction to release a table-level lock. No amount of index tuning will fix that.

Tip

Always observe wait events under real production load. Running your diagnostic queries on a quiet replica or during off-hours will show mostly Client waits (idle connections) and almost nothing useful. Capture during peak traffic or immediately when a slowdown is reported.


How Wait Events Work

PostgreSQL's process model assigns one OS process per client connection. When a process needs a resource it cannot immediately acquire — a mutex, a disk block, a network acknowledgement — the kernel puts that process to sleep. Before sleeping, the PostgreSQL code sets the two wait event fields in shared memory so that any monitoring query against pg_stat_activity can observe the state.

The Five Core Categories

Lock — heavyweight locks managed by PostgreSQL's lock manager. These protect concurrent access to tables, rows, pages, and other database objects. Common waits in this category:

  • relation — waiting for a table-level lock (e.g., concurrent DDL, explicit LOCK TABLE)
  • tuple — waiting for a row-level lock held by another transaction
  • extend — waiting to extend a relation (append a new page); common under heavy INSERT load
  • transactionid — waiting for another transaction to commit or rollback
  • advisory — application-level advisory locks

LWLock — lightweight locks used internally to protect shared data structures. Unlike heavyweight locks, LWLocks are not visible to SQL users, but their contention is very visible to DBAs. Common waits:

  • WALWrite — waiting to write WAL buffers to disk; appears under commit-heavy OLTP workloads
  • WALInsert — waiting to insert a record into the WAL buffer; also commit-heavy
  • buffer_content / BufferIO — shared buffer contention
  • lock_manager — the lock table itself is contended; occurs at very high connection counts

IO — actual disk I/O operations. These indicate that the data or WAL you need is not in cache.

  • DataFileRead — reading a heap or index page from disk (cache miss)
  • WALWrite — flushing WAL to disk at commit (note: the same name appears in both IO and LWLock categories)
  • SLRURead / SLRUWrite — reading or writing Simple LRU caches (pg_clog, pg_subtrans)

Client — the backend is waiting for the client to send or receive data. The single most common wait event in most databases is ClientRead, which simply means the connection is idle between queries. This is normal. What is not normal is seeing many non-idle backends stuck on ClientRead during active query processing, which usually indicates a saturated application tier or very slow clients.

Timeout — the process is sleeping for a fixed interval. Examples include PgSleep (explicit pg_sleep() call), AutoVacuumDelay (autovacuum throttling), and VacuumDelay (manual VACUUM cost delay). Seeing many backends in AutoVacuumMain (technically a Activity category wait) indicates autovacuum is very busy, which usually means tables have high dead-tuple counts.


Querying Wait Events in Practice

Snapshot: What Is Waiting Right Now?

sql
SELECT
    pid,
    usename,
    application_name,
    state,
    wait_event_type,
    wait_event,
    LEFT(query, 80)    AS query_snippet,
    now() - query_start AS query_duration
FROM pg_stat_activity
WHERE state != 'idle'
  AND wait_event IS NOT NULL
ORDER BY query_duration DESC NULLS LAST;

This gives you a live picture of every non-idle backend that is currently blocked on something. Sort by query_duration descending to find the longest-running blocked queries first.

Aggregate: Which Waits Are Most Common?

sql
SELECT
    wait_event_type,
    wait_event,
    COUNT(*)                          AS waiter_count,
    MAX(now() - query_start)          AS max_wait_duration
FROM pg_stat_activity
WHERE state != 'idle'
  AND wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY waiter_count DESC;
Important

pg_stat_activity is a point-in-time snapshot. A fast transient wait will not appear unless you happen to query at exactly the right moment. For chronic bottleneck analysis, poll every 1–5 seconds into a staging table, or install the pg_wait_sampling extension which uses statistical sampling to build histograms without you writing any polling code.

Polling Loop Into a Staging Table

sql
-- Run once to create the capture table
CREATE TABLE IF NOT EXISTS wait_event_samples (
    captured_at      TIMESTAMPTZ DEFAULT now(),
    pid              INT,
    wait_event_type  TEXT,
    wait_event       TEXT,
    query_duration   INTERVAL,
    query_snippet    TEXT
);

-- Insert a sample (schedule this every 5 seconds via pg_cron or an external script)
INSERT INTO wait_event_samples
    (pid, wait_event_type, wait_event, query_duration, query_snippet)
SELECT
    pid,
    wait_event_type,
    wait_event,
    now() - query_start,
    LEFT(query, 120)
FROM pg_stat_activity
WHERE state != 'idle'
  AND wait_event IS NOT NULL;
bash
# Shell polling alternative — 60 samples at 5-second intervals
for i in $(seq 1 60); do
  psql -c "
    INSERT INTO wait_event_samples
        (pid, wait_event_type, wait_event, query_duration, query_snippet)
    SELECT pid, wait_event_type, wait_event, now()-query_start, LEFT(query,120)
    FROM pg_stat_activity
    WHERE state != 'idle' AND wait_event IS NOT NULL;
  "
  sleep 5
done

Combining Wait Events with pg_locks

When you see wait_event_type = 'Lock', the next question is always: who is holding the lock that this backend is waiting for? pg_stat_activity alone cannot answer that. Join with pg_locks to trace the full blocker chain.

sql
-- Find blocker → blocked chains for all current lock waits
SELECT
    blocked.pid                     AS blocked_pid,
    blocked.usename                 AS blocked_user,
    blocked.query                   AS blocked_query,
    blocker.pid                     AS blocker_pid,
    blocker.usename                 AS blocker_user,
    blocker.query                   AS blocker_query,
    blocker.state                   AS blocker_state,
    now() - blocked.query_start     AS blocked_duration
FROM pg_stat_activity AS blocked
JOIN pg_locks         AS blocked_lock
    ON blocked_lock.pid = blocked.pid
   AND NOT blocked_lock.granted
JOIN pg_locks         AS blocker_lock
    ON  blocker_lock.locktype    = blocked_lock.locktype
   AND  blocker_lock.database   IS NOT DISTINCT FROM blocked_lock.database
   AND  blocker_lock.relation   IS NOT DISTINCT FROM blocked_lock.relation
   AND  blocker_lock.page       IS NOT DISTINCT FROM blocked_lock.page
   AND  blocker_lock.tuple      IS NOT DISTINCT FROM blocked_lock.tuple
   AND  blocker_lock.transactionid IS NOT DISTINCT FROM blocked_lock.transactionid
   AND  blocker_lock.classid    IS NOT DISTINCT FROM blocked_lock.classid
   AND  blocker_lock.objid      IS NOT DISTINCT FROM blocked_lock.objid
   AND  blocker_lock.objsubid   IS NOT DISTINCT FROM blocked_lock.objsubid
   AND  blocker_lock.granted
JOIN pg_stat_activity AS blocker
    ON blocker.pid = blocker_lock.pid
ORDER BY blocked_duration DESC;
Warning

In a lock chain with three or more levels (A blocks B, B blocks C), the query above only reveals one level at a time. If blocker_state shows 'idle in transaction' for the blocker, that transaction has stalled with locks held. Check idle_in_transaction_session_timeout — if it is not set, a forgotten BEGIN with no matching COMMIT can hold locks indefinitely.


Performance Tuning by Wait Category

High Lock / transactionid Waits

Root cause: long-running transactions holding row or table locks. Remediation:

  • Set idle_in_transaction_session_timeout = '30s' to kill stalled transactions automatically
  • Set lock_timeout = '5s' on application connections to fail fast rather than pile up
  • Review ORM batch operations — many ORMs wrap large operations in a single transaction unnecessarily
sql
-- Kill a specific blocker after confirming it is safe
SELECT pg_terminate_backend(blocker_pid);

-- Set session-level timeout for a critical connection
SET LOCAL lock_timeout = '3s';
SET LOCAL idle_in_transaction_session_timeout = '30s';

High WALWrite / WALInsert LWLock Waits

Root cause: commit-heavy workload saturating WAL I/O. Remediation:

  • Increase wal_buffers (default 4MB; try 64MB on high-commit systems)
  • Move WAL to a separate, faster disk (NVMe vs. spinning disk)
  • Switch to synchronous_commit = off for workloads that can tolerate up to one wal_writer_delay of data loss (async apps, analytics inserts)
  • Batch commits: group writes into fewer, larger transactions
sql
-- Check current WAL configuration
SHOW wal_buffers;
SHOW synchronous_commit;
SHOW wal_writer_delay;
SHOW commit_delay;
SHOW commit_siblings;

High DataFileRead IO Waits

Root cause: insufficient shared_buffers or working set larger than RAM. Remediation:

  • Increase shared_buffers to 25% of RAM as a starting point
  • Verify indexes exist for the queries causing the reads: check pg_stat_user_tables for seq_scan counts
  • Use pg_prewarm after a restart to warm the buffer cache for critical tables
  • Consider partitioning or archiving hot tables to reduce working set size

High AutoVacuumMain / AutoVacuumDelay Waits

Root cause: autovacuum is struggling to keep up with dead-tuple accumulation. Remediation:

  • Increase autovacuum_max_workers (default 3)
  • Reduce autovacuum_vacuum_cost_delay (try 2ms instead of the default 20ms)
  • Set per-table storage parameters for hotspots: ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);
sql
-- Tables with the most dead tuples and last vacuum time
SELECT
    schemaname,
    relname,
    n_dead_tup,
    n_live_tup,
    ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
    last_autovacuum,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;
Tip

If you see a single table consistently topping the dead-tuple list despite autovacuum running, that table likely has an extremely high UPDATE or DELETE rate. Consider application-level changes — such as batching deletes, using BRIN indexes on timestamp columns, or range partitioning with partition detach/attach — rather than solely tuning autovacuum parameters.


Real Diagnosis Examples

Example 1: Deployment Locks Everything

Symptom: during a schema migration deployment, all application queries begin timing out. The wait event snapshot shows 40+ backends with wait_event_type = 'Lock', wait_event = 'relation'. The blocker query reveals a long-running ALTER TABLE with ACCESS EXCLUSIVE waiting behind a 90-second OLAP query that took the same lock first. The fix: run the OLAP query outside peak hours, or rewrite the migration to use a lock-compatible approach (CREATE INDEX CONCURRENTLY, column addition without default values in Postgres 11+).

Example 2: Writes Slow Only at End of Month

Symptom: INSERT latency spikes to 200ms at month-end batch processing. Wait event aggregation shows the dominant wait is LWLock / WALWrite. The month-end job inserts 10 million rows in small autocommit transactions. Fix: batch the inserts into groups of 1,000–10,000 rows per transaction, and increase wal_buffers from the default. Insert latency drops to under 5ms per batch.

Example 3: Idle Connections Hiding True Waits

Symptom: the aggregate wait query shows 95% of waits are Client / ClientRead, making it look like the database is fine. But application response time is high. Filtering the snapshot query to state != 'idle' (as shown above) removes the idle connections and reveals 12 backends stuck on Lock / tuple from row-level lock contention in a payment processing queue table. Fix: implement SKIP LOCKED on the queue dequeue query.

sql
-- Queue dequeue with SKIP LOCKED to avoid row-level lock contention
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;

Key Takeaways
  • pg_stat_activity.wait_event_type and wait_event expose what each backend is blocked on; always filter to state != 'idle' to exclude idle connections
  • The five most actionable wait categories are Lock, LWLock, IO, Client, and Timeout — each maps to a distinct class of infrastructure or application problem
  • Lock waits require joining pg_locks to find the blocking transaction; the fix is almost always shorter transactions or appropriate timeouts
  • WALWrite LWLock contention is solved by larger wal_buffers, faster storage, batched commits, or async commit — not by query tuning
  • DataFileRead IO waits indicate cache misses; check buffer hit ratios and add indexes before throwing more RAM at the problem
  • Point-in-time snapshots miss transient waits; poll every few seconds or use pg_wait_sampling for statistically reliable histograms
  • SKIP LOCKED is the correct pattern for queue tables; it eliminates tuple lock pileups with no schema changes required

Working with JusDB on PostgreSQL Performance Analysis

Wait events give you the most direct signal available inside PostgreSQL about what is actually hurting performance, but interpreting them correctly under production load — distinguishing a transient spike from a systemic bottleneck, tracing a three-level lock chain, correlating wait event histograms with application metrics — takes experience with a wide range of workload patterns.

JusDB works with engineering teams on exactly these kinds of PostgreSQL performance investigations: building wait event baselines, setting up continuous monitoring, tuning autovacuum and WAL configuration for specific workload shapes, and designing schema changes that avoid lock escalation. If you are seeing unexplained latency spikes, connection pile-ups, or autovacuum falling behind on critical tables, get in touch with the JusDB team. We can usually identify the root cause within a single session of looking at your wait event data.

Share this article