A production order management system was returning results in 11 seconds for a query that filtered by status = 'pending' and sorted by created_at DESC. The table had 80 million rows, a B-tree index on status, and another on created_at — but PostgreSQL was ignoring both and doing a full sequential scan. The fix was a single composite index on (status, created_at DESC), and query time dropped to 4ms. The individual indexes existed; the composite did not, and the planner had no way to satisfy both predicates efficiently with separate indexes.
PostgreSQL supports six index types, each optimized for a different access pattern. Using B-tree where GIN is needed, skipping partial indexes on flag columns, or leaving bloated unused indexes in place all carry real performance costs — slow reads, slow writes, and wasted storage. Indexing strategy is one of the highest-leverage activities in database performance tuning, and it is also one of the most commonly done wrong.
This guide covers every major PostgreSQL index type, how to choose between them, partial and composite index patterns, covering indexes, and the system catalog queries that surface missing, unused, and bloated indexes in production databases.
- B-tree is the default and correct choice for equality, range, and sort operations on scalar values.
- GIN is the right choice for
JSONB, arrays, full-text search, andpg_trgmLIKE patterns. - BRIN is ideal for append-only tables where data is physically ordered by a column (like timestamps in time-series tables).
- Partial indexes index only rows matching a
WHEREclause — use them on flag columns and status fields to shrink index size dramatically. - Composite index column order matters: put equality columns first, range columns last.
- Covering indexes with
INCLUDEeliminate heap fetches for read-heavy queries. - Query
pg_stat_user_tablesandpg_stat_user_indexesto find missing and unused indexes. - Use
REINDEX CONCURRENTLYandCREATE INDEX CONCURRENTLYto avoid locking production tables.
Index Types in PostgreSQL
PostgreSQL ships with six index access methods. Each is implemented as a pluggable module and optimized for a specific category of data and query patterns. Choosing the wrong type means either poor query performance or wasted write overhead on an index the planner cannot use.
B-tree
B-tree is the default index type and the correct choice for the vast majority of production use cases. It supports equality (=), range (<, >, BETWEEN), prefix matching (LIKE 'foo%'), and sort operations (ORDER BY). PostgreSQL automatically uses B-tree indexes to satisfy ORDER BY clauses, which can eliminate explicit sort steps from query plans.
-- Default B-tree index (type is implicit)
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- Explicit type declaration (same result)
CREATE INDEX idx_orders_status ON orders USING btree (status);
-- Descending index — useful when queries sort DESC on this column
CREATE INDEX idx_orders_created_at_desc ON orders (created_at DESC);Hash
Hash indexes support only equality comparisons (=). They are faster than B-tree for pure equality lookups on high-cardinality columns, but they cannot support range queries, sorting, or multi-column indexes. Since PostgreSQL 10, hash indexes are WAL-logged and safe for crash recovery. In practice, B-tree is almost always preferred because its equality performance is nearly identical and it supports additional operators.
-- Hash index — equality only
CREATE INDEX idx_sessions_token ON sessions USING hash (session_token);GIN — Generalized Inverted Index
GIN indexes are designed for composite values where each document or row contains multiple indexable elements: JSONB documents, arrays, full-text search tsvector columns, and trigram patterns with pg_trgm. A GIN index maps individual elements back to the rows that contain them — the same structure as a search engine's inverted index.
-- GIN index on JSONB column — supports @>, ?, ?&, ?| operators
CREATE INDEX idx_products_metadata ON products USING gin (metadata);
-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', title || ' ' || body));
-- GIN index for array containment queries
CREATE INDEX idx_posts_tags ON posts USING gin (tags);
-- GIN index for LIKE/ILIKE with pg_trgm
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_customers_name_trgm ON customers USING gin (name gin_trgm_ops);GIN indexes are slower to build and update than B-tree indexes because inserting a single row may update hundreds of posting list entries. The fastupdate storage parameter (enabled by default) buffers pending updates to reduce write amplification at the cost of slightly slower reads during buffer flushes.
GiST — Generalized Search Tree
GiST is a framework for building indexes over arbitrary data types. It is used for geometric data (point, box, polygon, circle), PostGIS spatial data, range types (daterange, tsrange), and nearest-neighbor searches. Unlike GIN, GiST supports ORDER BY ... <-> nearest-neighbor queries.
-- GiST index for range overlap queries (e.g., booking systems)
CREATE INDEX idx_reservations_period ON reservations USING gist (during);
-- GiST index for geometric containment
CREATE INDEX idx_locations_coords ON locations USING gist (coordinates);
-- Full-text search also works with GiST (smaller than GIN, slower to query)
CREATE INDEX idx_articles_fts_gist ON articles USING gist (to_tsvector('english', body));BRIN — Block Range Index
BRIN indexes are extremely small. Rather than indexing individual values, they store the minimum and maximum value in each range of consecutive disk blocks (block ranges, defaulting to 128 blocks each). This makes them useful only when data is physically ordered in a way that correlates with the indexed column — typically append-only tables where rows are inserted in timestamp order.
-- BRIN on a time-series events table — data appended in time order
CREATE INDEX idx_events_occurred_at_brin ON events USING brin (occurred_at);
-- BRIN with smaller block range for finer granularity (default is 128)
CREATE INDEX idx_logs_created_brin ON logs USING brin (created_at) WITH (pages_per_range = 32);
-- Check physical correlation of a column (values close to 1.0 or -1.0 = good BRIN candidate)
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'events'
ORDER BY abs(correlation) DESC;A BRIN index on a 1-billion-row time-series table may be just a few hundred kilobytes, versus several gigabytes for a B-tree. The trade-off is that BRIN has much lower selectivity — it narrows down candidate blocks but then PostgreSQL reads entire block ranges, making it unsuitable for high-selectivity point lookups.
SP-GiST — Space-Partitioned GiST
SP-GiST supports space-partitioned data structures like k-d trees, quad trees, and radix trees. It is useful for non-overlapping partitioned data such as IP address ranges, phone numbers, and certain geographic applications. For most production teams, PostGIS with GiST covers geospatial use cases adequately, and SP-GiST is used only when the built-in types such as inet or point benefit from it specifically.
-- SP-GiST for inet (IP address) range lookups
CREATE INDEX idx_blocklist_cidr ON ip_blocklist USING spgist (cidr_range);When to Use Each Index Type
The decision tree for PostgreSQL index type selection is straightforward once you know the column data type and the operators your queries use.
| Data / Query Pattern | Recommended Index | Why |
|---|---|---|
| Equality, range, sort on scalar types (integers, timestamps, UUIDs, text) | B-tree | Supports all comparison operators and ORDER BY |
| Pure equality on high-cardinality column with no range queries | B-tree (or Hash) | B-tree is simpler and nearly as fast; Hash only if benchmarks prove otherwise |
JSONB containment (@>), key existence (?), path queries |
GIN | GIN inverts JSONB keys to enable fast containment lookups |
Array containment (@>, <@, &&) |
GIN | Only index type that supports array operators efficiently |
Full-text search with tsvector |
GIN (or GiST) | GIN is faster for reads; GiST is smaller and faster to update |
LIKE/ILIKE with leading wildcard (%term%) |
GIN with pg_trgm | B-tree cannot use leading wildcards; trigrams handle arbitrary substrings |
Range type overlap (&&), containment (@>, <@) |
GiST | GiST supports range operators; GIN does not for range types |
| PostGIS geometry / geography | GiST (or BRIN for partitioned) | PostGIS operators are built for GiST; BRIN for large append-only spatial tables |
| Append-only table with physically ordered timestamp | BRIN | Tiny index footprint; effective when physical order matches query predicates |
| Nearest-neighbor search on geometric data | GiST | GiST supports ORDER BY ... <-> KNN queries |
EXPLAIN (ANALYZE, BUFFERS) on your query. If the planner is not using the index, check whether the query operators match what B-tree supports. A %term% LIKE pattern or a JSONB containment check will tell you immediately that you need GIN instead.
Partial Indexes — Index Only What You Need
A partial index includes a WHERE clause that limits which rows are indexed. This is one of the most powerful and underused indexing techniques in PostgreSQL. If a column has low cardinality — say, a status column where 97% of rows are 'completed' and only 3% are 'pending' — indexing all rows wastes space and write overhead. A partial index on just the 'pending' rows is a fraction of the size and covers 100% of the queries that need fast access.
-- Index only pending orders — 3% of rows, 100% of the interesting queries
CREATE INDEX idx_orders_pending ON orders (created_at DESC)
WHERE status = 'pending';
-- This query uses the partial index efficiently
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, customer_id, total
FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 50;
-- Partial index on soft-deleted rows (exclude deleted)
CREATE INDEX idx_users_active_email ON users (email)
WHERE deleted_at IS NULL;
-- Unique constraint only on active records (allows multiple deleted rows with same email)
CREATE UNIQUE INDEX idx_users_email_unique_active ON users (email)
WHERE deleted_at IS NULL;Partial indexes also reduce index bloat. When rows transition out of the indexed condition (e.g., an order moves from 'pending' to 'completed'), the index entry is removed and not replaced — the index stays lean automatically.
(tenant_id, entity_id) with WHERE deleted_at IS NULL to cover the common active-record query without indexing historical data. The index shrinks as soft-deleted rows accumulate, instead of growing with the table.
Composite Indexes — Column Order Matters
A composite (multi-column) B-tree index can satisfy queries that filter or sort on multiple columns simultaneously. The most critical and commonly misunderstood rule: put equality columns before range columns. The planner can use the prefix of a composite index but cannot efficiently skip to a range column if an equality column precedes it in the query but not the index.
-- WRONG: range column (created_at) before equality column (status)
-- This index cannot satisfy: WHERE status = 'pending' ORDER BY created_at DESC
CREATE INDEX idx_orders_bad ON orders (created_at DESC, status);
-- CORRECT: equality column first, then range/sort column
CREATE INDEX idx_orders_good ON orders (status, created_at DESC);
-- This query now uses a single index scan efficiently
SELECT id, customer_id, total
FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 100;A composite index also acts as a covering index for any query that only needs the indexed columns — the planner can do an index-only scan and never touch the heap. Consider which additional columns your most frequent queries SELECT and whether adding them to a composite index would eliminate heap fetches.
-- Multi-tenant composite index: tenant equality first, then filter column
CREATE INDEX idx_invoices_tenant_status ON invoices (tenant_id, status, due_date DESC);
-- Supports all of these queries efficiently:
-- WHERE tenant_id = $1
-- WHERE tenant_id = $1 AND status = 'overdue'
-- WHERE tenant_id = $1 AND status = 'overdue' ORDER BY due_date DESC
-- Check which prefix columns are actually used in plans
EXPLAIN (ANALYZE, FORMAT TEXT)
SELECT id, amount, due_date
FROM invoices
WHERE tenant_id = '3f7a2c1d' AND status = 'overdue'
ORDER BY due_date DESC
LIMIT 20;ANALYZE tablename to refresh statistics. Stale statistics cause the planner to make wrong decisions about whether to use an index.
Covering Indexes with INCLUDE
PostgreSQL 11 introduced the INCLUDE clause, which adds non-key columns to a B-tree index leaf pages without making them part of the search key. The planner can perform an index-only scan — retrieving all needed column values directly from the index without fetching the heap row — as long as the visibility map shows the heap page is all-visible (i.e., VACUUM has run recently enough).
-- Without INCLUDE: query needs heap fetch to get 'amount' and 'currency'
CREATE INDEX idx_payments_status_created ON payments (status, created_at DESC);
-- With INCLUDE: index-only scan satisfies the full query without touching the heap
CREATE INDEX idx_payments_status_created_covering ON payments (status, created_at DESC)
INCLUDE (amount, currency);
-- This query now shows "Index Only Scan" in EXPLAIN output
EXPLAIN (ANALYZE, BUFFERS)
SELECT status, created_at, amount, currency
FROM payments
WHERE status = 'settled'
ORDER BY created_at DESC
LIMIT 200;
-- Verify index-only scan is happening
-- Look for: "Index Only Scan using idx_payments_status_created_covering"
-- And: "Heap Fetches: 0" (or close to 0 if visibility map is current)The INCLUDE columns do not affect the sort order of the index and cannot be used as search predicates — they are purely payload for index-only scans. Avoid including large columns such as text or jsonb fields, as this bloats the index without proportional benefit. Focus on small, frequently-SELECTed columns like amounts, flags, and short identifiers.
VACUUM ANALYZE tablename to update the visibility map. Without a current visibility map, PostgreSQL still has to fetch the heap to confirm row visibility, and your "Index Only Scan" will report high Heap Fetches. Regular autovacuum handles this automatically in steady-state, but force it after bulk operations.
Finding Missing Indexes
PostgreSQL's statistics collector tracks sequential scans on every table. A high sequential scan count on a large table is a strong signal that a missing index is causing full table scans for queries that should be using an index.
Using pg_stat_user_tables
-- Tables with high sequential scans relative to index scans — missing index candidates
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
seq_scan - idx_scan AS seq_scan_excess,
n_live_tup AS live_rows,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND n_live_tup > 10000 -- only tables large enough to care about
ORDER BY seq_scan_excess DESC
LIMIT 20;-- Tables with zero index scans but many sequential scans — likely missing all indexes
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
n_live_tup AS live_rows,
pg_size_pretty(pg_total_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE idx_scan = 0
AND seq_scan > 100
AND n_live_tup > 5000
ORDER BY seq_tup_read DESC;Slow Query Analysis with pg_stat_statements
-- Slowest queries by total execution time — targets for index optimization
SELECT
query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
rows
FROM pg_stat_statements
WHERE calls > 50
ORDER BY total_exec_time DESC
LIMIT 25;For each slow query identified, run EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) to see whether a sequential scan is occurring and which filter predicates are evaluated without index support. Look for "Rows Removed by Filter" counts that are orders of magnitude larger than the actual result set — this indicates a full scan with post-scan filtering.
-- Run EXPLAIN ANALYZE on a slow query to diagnose index usage
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT order_id, customer_id, total_amount
FROM orders
WHERE customer_id = 'cust-7821'
AND status IN ('pending', 'processing')
ORDER BY created_at DESC
LIMIT 50;
-- Key things to look for in the output:
-- "Seq Scan" (bad) vs "Index Scan" or "Index Only Scan" (good)
-- "Rows Removed by Filter" (high number = full scan)
-- "Buffers: shared hit=X read=Y" (high 'read' = going to disk)
-- "Execution Time" at the bottomDetecting Unused and Bloated Indexes
Unused indexes are a write performance tax. Every INSERT, UPDATE, and DELETE on a table must update all indexes on that table, regardless of whether any query ever uses them. Removing unused indexes reduces write overhead directly and can meaningfully improve throughput on write-heavy tables.
Finding Unused Indexes
-- Indexes with zero or very low scan counts since last statistics reset
SELECT
schemaname,
tablename,
indexname,
idx_scan AS index_scans,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(indrelid)) AS table_size
FROM pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
WHERE idx_scan = 0
AND NOT indisprimary -- exclude primary keys
AND NOT indisunique -- exclude unique constraints (enforced on writes)
AND pg_relation_size(indexrelid) > 8192 -- skip trivially small indexes
ORDER BY pg_relation_size(indexrelid) DESC;pg_stat_bgwriter.stats_reset or pg_stat_user_indexes.last_idx_scan (PostgreSQL 16+) to understand the observation window before dropping indexes. On busy systems, monitor for at least two weeks across all traffic patterns including end-of-month reporting or weekly batch jobs before concluding an index is truly unused.
-- Check when statistics were last reset (confirms observation window)
SELECT stats_reset FROM pg_stat_bgwriter;
-- PostgreSQL 16+: see the last time each index was actually scanned
SELECT
schemaname,
tablename,
indexname,
idx_scan,
last_idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE last_idx_scan < now() - interval '30 days'
OR last_idx_scan IS NULL
ORDER BY pg_relation_size(indexrelid) DESC;Detecting Bloated Indexes
Index bloat occurs when dead tuples accumulate in index pages faster than VACUUM can reclaim them. Bloated indexes waste storage and slow down scans because they contain many empty or mostly-dead pages that the planner must still traverse.
-- Estimate index bloat using pgstattuple extension
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT
indexname,
pg_size_pretty(index_size) AS index_size,
round(avg_leaf_density::numeric, 1) AS leaf_density_pct,
round((1 - avg_leaf_density / 100) * 100, 1) AS bloat_pct,
leaf_fragmentation
FROM (
SELECT
indexname,
pg_relation_size(indexrelid) AS index_size,
(pgstatindex(indexrelid::regclass)).*
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
) AS idx_stats
WHERE index_size > 10 * 1024 * 1024 -- only indexes larger than 10MB
ORDER BY bloat_pct DESC
LIMIT 20;An index with leaf density below 50% is moderately bloated; below 30% warrants immediate rebuilding. High-churn tables — those with frequent deletes and updates — accumulate index bloat fastest. The orders, sessions, events, and notifications tables in most production schemas are common culprits.
Index Maintenance
Indexes require ongoing maintenance: rebuilding bloated indexes, removing unused ones, and creating new indexes on live tables without locking writes. PostgreSQL provides concurrency-safe tools for all of these operations.
Creating Indexes Without Locking
-- CREATE INDEX CONCURRENTLY — does not lock writes to the table
-- Takes longer to build (multiple passes) but safe for production
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status)
WHERE status != 'archived';
-- Monitor progress in PostgreSQL 12+ (check pg_stat_progress_create_index)
SELECT
phase,
blocks_done,
blocks_total,
round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS pct_complete
FROM pg_stat_progress_create_index
WHERE relid = 'orders'::regclass;CREATE INDEX CONCURRENTLY can fail if another transaction holds a long-running lock during the build. If it fails, it leaves behind an invalid index marked as indisvalid = false in pg_index. This invalid index still imposes write overhead but provides no query benefit. Always check for and drop invalid indexes: SELECT indexname FROM pg_indexes JOIN pg_index ON indexrelid = (SELECT oid FROM pg_class WHERE relname = indexname) WHERE NOT indisvalid;
Rebuilding Bloated Indexes
-- REINDEX CONCURRENTLY — rebuilds index without locking reads or writes
-- Available from PostgreSQL 12+
REINDEX INDEX CONCURRENTLY idx_orders_customer_status;
-- Rebuild all indexes on a table concurrently
REINDEX TABLE CONCURRENTLY orders;
-- Rebuild all indexes in a schema concurrently (PostgreSQL 14+)
REINDEX SCHEMA CONCURRENTLY public;
-- Check for invalid indexes left by failed concurrent reindex
SELECT schemaname, tablename, indexname
FROM pg_indexes
JOIN pg_class ON relname = indexname
JOIN pg_index ON indexrelid = pg_class.oid
WHERE NOT indisvalid;VACUUM and Index Cleanup
-- VACUUM reclaims dead tuple space and updates visibility map (enables index-only scans)
VACUUM ANALYZE orders;
-- VACUUM FULL reclaims space aggressively but locks the table — use rarely
-- Prefer REINDEX CONCURRENTLY + VACUUM ANALYZE over VACUUM FULL for indexes
VACUUM FULL orders; -- acquires AccessExclusiveLock — blocks all reads and writes
-- Check autovacuum activity on a specific table
SELECT
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
n_dead_tup,
n_live_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE relname = 'orders';For high-churn tables where autovacuum cannot keep up with dead tuple accumulation, increase autovacuum aggressiveness at the table level rather than globally:
-- Speed up autovacuum for a high-churn table without affecting others
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- trigger at 1% dead tuples (default 20%)
autovacuum_vacuum_cost_delay = 2, -- reduce throttling (ms, default 20)
autovacuum_analyze_scale_factor = 0.005 -- analyze at 0.5% changes (default 10%)
);- B-tree covers equality, range, and sort — use it as the default. Switch to GIN for JSONB, arrays, and full-text search. Use BRIN only on physically-ordered append-only tables.
- Composite index column order is critical: equality predicates first, range and sort predicates last. A mis-ordered composite index is often ignored by the planner entirely.
- Partial indexes on low-cardinality flag columns (status, is_active, deleted_at IS NULL) are dramatically smaller than full indexes and cover 100% of the queries that matter.
- Covering indexes with
INCLUDEeliminate heap fetches for read-heavy queries — verify withEXPLAIN ANALYZEand look for "Index Only Scan" and "Heap Fetches: 0". - Query
pg_stat_user_tablesregularly to find tables with high sequential scan counts relative to index scans — these are missing index candidates. - Never drop an index that shows zero scans without verifying the observation window spans all traffic patterns including batch jobs and end-of-period reporting.
- Use
CREATE INDEX CONCURRENTLYandREINDEX CONCURRENTLYin production — never lock your table for index operations on a live system. - Run
VACUUM ANALYZEafter bulk operations to refresh the visibility map (enabling index-only scans) and update planner statistics.
Working with JusDB on PostgreSQL Performance
JusDB tunes PostgreSQL index strategies for engineering teams — identifying missing indexes, removing bloated unused indexes, and designing composite index strategies that match your actual query patterns. Our DBAs have turned multi-second queries into sub-millisecond operations.
Explore JusDB PostgreSQL Management → | Talk to a DBA
Related reading: