A PostgreSQL query that ran in 12ms yesterday is now taking 4 seconds. No code changed, no schema changed, but something is clearly wrong. Wait events are PostgreSQL's built-in mechanism for exposing exactly what each backend process is waiting for at any given moment — lock acquisition, disk I/O, WAL flushing, network round-trips. Reading them correctly is the difference between 30 minutes of guesswork and a precise, five-minute diagnosis. This post covers the wait event categories, the queries to surface them, and real-world examples of how to act on what you find.
- Wait events in
pg_stat_activityshow what each backend is blocked on right now — lock contention, I/O, WAL, CPU, or network - The five major categories are Lock, LWLock, IO, Client, and Timeout
- Query
wait_event_typeandwait_eventtogether; neither column alone tells the full story - Join
pg_stat_activitywithpg_locksto trace the full blocker chain - Aggregate wait counts over time with
pg_wait_samplingor a simple polling loop to find chronic bottlenecks, not just instantaneous ones
What Are PostgreSQL Wait Events?
Every PostgreSQL backend process spends its time either executing work on the CPU or waiting for something external to complete. When a process is waiting, PostgreSQL records the category and specific name of the wait in two columns of pg_stat_activity: wait_event_type and wait_event. When a process is actively running, both columns are NULL.
Wait events were introduced in PostgreSQL 9.6 and have been expanded significantly in each major release since. As of PostgreSQL 16, there are over 200 distinct named wait events grouped into roughly a dozen categories. For most production tuning work, you will focus on five: Lock, LWLock, IO, Client, and Timeout.
The critical insight is that a wait event tells you what the process needs but cannot yet have. A backend showing wait_event_type = 'Lock' and wait_event = 'relation' is not slow because of bad SQL — it is blocked waiting for another transaction to release a table-level lock. No amount of index tuning will fix that.
Always observe wait events under real production load. Running your diagnostic queries on a quiet replica or during off-hours will show mostly Client waits (idle connections) and almost nothing useful. Capture during peak traffic or immediately when a slowdown is reported.
How Wait Events Work
PostgreSQL's process model assigns one OS process per client connection. When a process needs a resource it cannot immediately acquire — a mutex, a disk block, a network acknowledgement — the kernel puts that process to sleep. Before sleeping, the PostgreSQL code sets the two wait event fields in shared memory so that any monitoring query against pg_stat_activity can observe the state.
The Five Core Categories
Lock — heavyweight locks managed by PostgreSQL's lock manager. These protect concurrent access to tables, rows, pages, and other database objects. Common waits in this category:
relation— waiting for a table-level lock (e.g., concurrent DDL, explicitLOCK TABLE)tuple— waiting for a row-level lock held by another transactionextend— waiting to extend a relation (append a new page); common under heavy INSERT loadtransactionid— waiting for another transaction to commit or rollbackadvisory— application-level advisory locks
LWLock — lightweight locks used internally to protect shared data structures. Unlike heavyweight locks, LWLocks are not visible to SQL users, but their contention is very visible to DBAs. Common waits:
WALWrite— waiting to write WAL buffers to disk; appears under commit-heavy OLTP workloadsWALInsert— waiting to insert a record into the WAL buffer; also commit-heavybuffer_content/BufferIO— shared buffer contentionlock_manager— the lock table itself is contended; occurs at very high connection counts
IO — actual disk I/O operations. These indicate that the data or WAL you need is not in cache.
DataFileRead— reading a heap or index page from disk (cache miss)WALWrite— flushing WAL to disk at commit (note: the same name appears in both IO and LWLock categories)SLRURead/SLRUWrite— reading or writing Simple LRU caches (pg_clog, pg_subtrans)
Client — the backend is waiting for the client to send or receive data. The single most common wait event in most databases is ClientRead, which simply means the connection is idle between queries. This is normal. What is not normal is seeing many non-idle backends stuck on ClientRead during active query processing, which usually indicates a saturated application tier or very slow clients.
Timeout — the process is sleeping for a fixed interval. Examples include PgSleep (explicit pg_sleep() call), AutoVacuumDelay (autovacuum throttling), and VacuumDelay (manual VACUUM cost delay). Seeing many backends in AutoVacuumMain (technically a Activity category wait) indicates autovacuum is very busy, which usually means tables have high dead-tuple counts.
Querying Wait Events in Practice
Snapshot: What Is Waiting Right Now?
SELECT
pid,
usename,
application_name,
state,
wait_event_type,
wait_event,
LEFT(query, 80) AS query_snippet,
now() - query_start AS query_duration
FROM pg_stat_activity
WHERE state != 'idle'
AND wait_event IS NOT NULL
ORDER BY query_duration DESC NULLS LAST;This gives you a live picture of every non-idle backend that is currently blocked on something. Sort by query_duration descending to find the longest-running blocked queries first.
Aggregate: Which Waits Are Most Common?
SELECT
wait_event_type,
wait_event,
COUNT(*) AS waiter_count,
MAX(now() - query_start) AS max_wait_duration
FROM pg_stat_activity
WHERE state != 'idle'
AND wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY waiter_count DESC;pg_stat_activity is a point-in-time snapshot. A fast transient wait will not appear unless you happen to query at exactly the right moment. For chronic bottleneck analysis, poll every 1–5 seconds into a staging table, or install the pg_wait_sampling extension which uses statistical sampling to build histograms without you writing any polling code.
Polling Loop Into a Staging Table
-- Run once to create the capture table
CREATE TABLE IF NOT EXISTS wait_event_samples (
captured_at TIMESTAMPTZ DEFAULT now(),
pid INT,
wait_event_type TEXT,
wait_event TEXT,
query_duration INTERVAL,
query_snippet TEXT
);
-- Insert a sample (schedule this every 5 seconds via pg_cron or an external script)
INSERT INTO wait_event_samples
(pid, wait_event_type, wait_event, query_duration, query_snippet)
SELECT
pid,
wait_event_type,
wait_event,
now() - query_start,
LEFT(query, 120)
FROM pg_stat_activity
WHERE state != 'idle'
AND wait_event IS NOT NULL;# Shell polling alternative — 60 samples at 5-second intervals
for i in $(seq 1 60); do
psql -c "
INSERT INTO wait_event_samples
(pid, wait_event_type, wait_event, query_duration, query_snippet)
SELECT pid, wait_event_type, wait_event, now()-query_start, LEFT(query,120)
FROM pg_stat_activity
WHERE state != 'idle' AND wait_event IS NOT NULL;
"
sleep 5
doneCombining Wait Events with pg_locks
When you see wait_event_type = 'Lock', the next question is always: who is holding the lock that this backend is waiting for? pg_stat_activity alone cannot answer that. Join with pg_locks to trace the full blocker chain.
-- Find blocker → blocked chains for all current lock waits
SELECT
blocked.pid AS blocked_pid,
blocked.usename AS blocked_user,
blocked.query AS blocked_query,
blocker.pid AS blocker_pid,
blocker.usename AS blocker_user,
blocker.query AS blocker_query,
blocker.state AS blocker_state,
now() - blocked.query_start AS blocked_duration
FROM pg_stat_activity AS blocked
JOIN pg_locks AS blocked_lock
ON blocked_lock.pid = blocked.pid
AND NOT blocked_lock.granted
JOIN pg_locks AS blocker_lock
ON blocker_lock.locktype = blocked_lock.locktype
AND blocker_lock.database IS NOT DISTINCT FROM blocked_lock.database
AND blocker_lock.relation IS NOT DISTINCT FROM blocked_lock.relation
AND blocker_lock.page IS NOT DISTINCT FROM blocked_lock.page
AND blocker_lock.tuple IS NOT DISTINCT FROM blocked_lock.tuple
AND blocker_lock.transactionid IS NOT DISTINCT FROM blocked_lock.transactionid
AND blocker_lock.classid IS NOT DISTINCT FROM blocked_lock.classid
AND blocker_lock.objid IS NOT DISTINCT FROM blocked_lock.objid
AND blocker_lock.objsubid IS NOT DISTINCT FROM blocked_lock.objsubid
AND blocker_lock.granted
JOIN pg_stat_activity AS blocker
ON blocker.pid = blocker_lock.pid
ORDER BY blocked_duration DESC;In a lock chain with three or more levels (A blocks B, B blocks C), the query above only reveals one level at a time. If blocker_state shows 'idle in transaction' for the blocker, that transaction has stalled with locks held. Check idle_in_transaction_session_timeout — if it is not set, a forgotten BEGIN with no matching COMMIT can hold locks indefinitely.
Performance Tuning by Wait Category
High Lock / transactionid Waits
Root cause: long-running transactions holding row or table locks. Remediation:
- Set
idle_in_transaction_session_timeout = '30s'to kill stalled transactions automatically - Set
lock_timeout = '5s'on application connections to fail fast rather than pile up - Review ORM batch operations — many ORMs wrap large operations in a single transaction unnecessarily
-- Kill a specific blocker after confirming it is safe
SELECT pg_terminate_backend(blocker_pid);
-- Set session-level timeout for a critical connection
SET LOCAL lock_timeout = '3s';
SET LOCAL idle_in_transaction_session_timeout = '30s';High WALWrite / WALInsert LWLock Waits
Root cause: commit-heavy workload saturating WAL I/O. Remediation:
- Increase
wal_buffers(default 4MB; try 64MB on high-commit systems) - Move WAL to a separate, faster disk (NVMe vs. spinning disk)
- Switch to
synchronous_commit = offfor workloads that can tolerate up to onewal_writer_delayof data loss (async apps, analytics inserts) - Batch commits: group writes into fewer, larger transactions
-- Check current WAL configuration
SHOW wal_buffers;
SHOW synchronous_commit;
SHOW wal_writer_delay;
SHOW commit_delay;
SHOW commit_siblings;High DataFileRead IO Waits
Root cause: insufficient shared_buffers or working set larger than RAM. Remediation:
- Increase
shared_buffersto 25% of RAM as a starting point - Verify indexes exist for the queries causing the reads: check
pg_stat_user_tablesforseq_scancounts - Use
pg_prewarmafter a restart to warm the buffer cache for critical tables - Consider partitioning or archiving hot tables to reduce working set size
High AutoVacuumMain / AutoVacuumDelay Waits
Root cause: autovacuum is struggling to keep up with dead-tuple accumulation. Remediation:
- Increase
autovacuum_max_workers(default 3) - Reduce
autovacuum_vacuum_cost_delay(try 2ms instead of the default 20ms) - Set per-table storage parameters for hotspots:
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01);
-- Tables with the most dead tuples and last vacuum time
SELECT
schemaname,
relname,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;If you see a single table consistently topping the dead-tuple list despite autovacuum running, that table likely has an extremely high UPDATE or DELETE rate. Consider application-level changes — such as batching deletes, using BRIN indexes on timestamp columns, or range partitioning with partition detach/attach — rather than solely tuning autovacuum parameters.
Real Diagnosis Examples
Example 1: Deployment Locks Everything
Symptom: during a schema migration deployment, all application queries begin timing out. The wait event snapshot shows 40+ backends with wait_event_type = 'Lock', wait_event = 'relation'. The blocker query reveals a long-running ALTER TABLE with ACCESS EXCLUSIVE waiting behind a 90-second OLAP query that took the same lock first. The fix: run the OLAP query outside peak hours, or rewrite the migration to use a lock-compatible approach (CREATE INDEX CONCURRENTLY, column addition without default values in Postgres 11+).
Example 2: Writes Slow Only at End of Month
Symptom: INSERT latency spikes to 200ms at month-end batch processing. Wait event aggregation shows the dominant wait is LWLock / WALWrite. The month-end job inserts 10 million rows in small autocommit transactions. Fix: batch the inserts into groups of 1,000–10,000 rows per transaction, and increase wal_buffers from the default. Insert latency drops to under 5ms per batch.
Example 3: Idle Connections Hiding True Waits
Symptom: the aggregate wait query shows 95% of waits are Client / ClientRead, making it look like the database is fine. But application response time is high. Filtering the snapshot query to state != 'idle' (as shown above) removes the idle connections and reveals 12 backends stuck on Lock / tuple from row-level lock contention in a payment processing queue table. Fix: implement SKIP LOCKED on the queue dequeue query.
-- Queue dequeue with SKIP LOCKED to avoid row-level lock contention
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;pg_stat_activity.wait_event_typeandwait_eventexpose what each backend is blocked on; always filter tostate != 'idle'to exclude idle connections- The five most actionable wait categories are Lock, LWLock, IO, Client, and Timeout — each maps to a distinct class of infrastructure or application problem
- Lock waits require joining
pg_locksto find the blocking transaction; the fix is almost always shorter transactions or appropriate timeouts - WALWrite LWLock contention is solved by larger
wal_buffers, faster storage, batched commits, or async commit — not by query tuning - DataFileRead IO waits indicate cache misses; check buffer hit ratios and add indexes before throwing more RAM at the problem
- Point-in-time snapshots miss transient waits; poll every few seconds or use
pg_wait_samplingfor statistically reliable histograms SKIP LOCKEDis the correct pattern for queue tables; it eliminates tuple lock pileups with no schema changes required
Working with JusDB on PostgreSQL Performance Analysis
Wait events give you the most direct signal available inside PostgreSQL about what is actually hurting performance, but interpreting them correctly under production load — distinguishing a transient spike from a systemic bottleneck, tracing a three-level lock chain, correlating wait event histograms with application metrics — takes experience with a wide range of workload patterns.
JusDB works with engineering teams on exactly these kinds of PostgreSQL performance investigations: building wait event baselines, setting up continuous monitoring, tuning autovacuum and WAL configuration for specific workload shapes, and designing schema changes that avoid lock escalation. If you are seeing unexplained latency spikes, connection pile-ups, or autovacuum falling behind on critical tables, get in touch with the JusDB team. We can usually identify the root cause within a single session of looking at your wait event data.