Database Performance

hypopg: Test PostgreSQL Index Impact Without Building the Index

Use hypopg to test how a new PostgreSQL index would affect query plans before building it. Includes install, hypothetical index creation, EXPLAIN workflow, and index sizing.

JusDB Team
March 5, 2026
8 min read
259 views

What if you could know whether a new index would actually be used by the query planner — before spending hours building it? On a 500GB production table, a misguided CREATE INDEX is not just wasted time; it locks resources, consumes disk, and may never improve a single query. That is exactly the problem hypopg solves. It lets you test hypothetical indexes in PostgreSQL without writing a single byte to disk, giving you planner-level evidence to guide index decisions before you commit.

TL;DR
  • hypopg is a PostgreSQL extension that creates in-memory "hypothetical" indexes — the planner can see and use them in EXPLAIN, but they do not exist on disk.
  • Use hypopg_create_index() to define a hypothetical index and immediately run EXPLAIN to see whether the planner would switch from a Seq Scan to an Index Scan.
  • hypopg_relation_size() estimates how large the real index would be, so you can compare a full index against a partial index before committing disk space.
  • Always use EXPLAIN, never EXPLAIN ANALYZE — hypothetical indexes cannot execute; ANALYZE will ignore them.
  • Hypothetical indexes are session-scoped and auto-drop on disconnect; call hypopg_reset() to clear them manually within a session.

What is hypopg?

hypopg is an open-source PostgreSQL extension developed by Julien Rouhaud that injects virtual index definitions into the planner's catalog without writing anything to disk. When you call hypopg_create_index(), PostgreSQL records metadata describing the index — column list, index type, predicate for partial indexes — and exposes it to the query planner exactly as a real index would be exposed. The planner evaluates these entries during cost estimation, so an EXPLAIN on any query will reflect whether that hypothetical index would be chosen.

Because no pages are written to disk and no index scans can actually execute, the only output you get is a cost estimate. That is all you need. If the planner's estimated cost drops by two orders of magnitude after you add the hypothetical index, you have strong evidence that building the real index will pay off. If the planner ignores it, you know the index would be useless before spending hours on CREATE INDEX CONCURRENTLY.

hypopg supports B-tree, GiST, GIN, and BRIN index types, covering the vast majority of indexing scenarios in production PostgreSQL workloads. It works with PostgreSQL 9.2 and above, including the current PostgreSQL 17 release.

Installing hypopg

On Ubuntu or Debian, the simplest path is the official PostgreSQL apt repository package:

bash
# Ubuntu/Debian (PostgreSQL 17)
sudo apt-get install postgresql-17-hypopg

# Or build from source
git clone https://github.com/HypoPG/hypopg
cd hypopg && make && sudo make install

After the shared library is installed, enable the extension inside the target database:

sql
-- Enable in database
CREATE EXTENSION hypopg;

No postgresql.conf changes or server restarts are required. The extension is fully session-scoped: hypothetical indexes created in one session are invisible to all other sessions and are automatically dropped when the session terminates.

Creating and Testing Hypothetical Indexes

Step 1 — Create a hypothetical index

Pass any valid CREATE INDEX statement as a string to hypopg_create_index(). The function returns the internal indexrelid and a generated name:

sql
-- Create a hypothetical index (not actually built on disk)
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders(customer_id, created_at DESC)');
-- Returns: indexrelid | indexname
-- ─────────────────────────────────────────
-- 24601      | <24601>btree_orders_customer_id

-- Create a partial hypothetical index
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON orders(status) WHERE status IN (''pending'', ''processing'')'
);

-- Create hypothetical index on expression
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON users(lower(email))'
);

-- List all hypothetical indexes
SELECT * FROM hypopg_list_indexes();

Step 2 — Run EXPLAIN before and after

Always use EXPLAIN (not EXPLAIN ANALYZE) when testing hypothetical indexes. EXPLAIN ANALYZE actually executes the query and cannot use hypothetical indexes — it will revert to the real (non-indexed) plan while reporting the hypothetical plan's cost estimates. This is confusing and misleading.
sql
-- Before hypothetical index
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345 AND created_at > NOW() - INTERVAL '30 days';
-- Seq Scan on orders (cost=0.00..45231.00 rows=847 width=89)
--   Filter: ((customer_id = 12345) AND (created_at > ...))

-- After creating hypothetical index
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders(customer_id, created_at DESC)');

EXPLAIN SELECT * FROM orders WHERE customer_id = 12345 AND created_at > NOW() - INTERVAL '30 days';
-- Index Scan using <24601>btree_orders_customer_id on orders (cost=0.56..89.23 rows=847 width=89)
--   Index Cond: ((customer_id = 12345) AND (created_at > ...))
-- Cost dropped from 45231 → 89: 508x improvement

The planner switched from a Seq Scan costing 45,231 to an Index Scan costing 89 — a 508x estimated improvement. That single EXPLAIN output is your green light to build the real index.

Step 3 — Estimate index size

Before committing disk space, use hypopg_relation_size() to get an estimated byte count for the real index:

sql
-- hypopg can also estimate how large the real index would be
SELECT hypopg_relation_size(indexrelid) AS estimated_index_size_bytes,
       pg_size_pretty(hypopg_relation_size(indexrelid)) AS human_readable
FROM hypopg_list_indexes();

Step 4 — Compare full vs. partial indexes

One of the most practical uses of hypopg is comparing a full index against a partial index to find the smallest index that still gets used by the planner:

sql
-- Full index
SELECT * FROM hypopg_create_index('CREATE INDEX ON events(type, created_at)');
EXPLAIN SELECT * FROM events WHERE type = 'click' AND created_at > NOW() - INTERVAL '1 day';
SELECT hypopg_relation_size(indexrelid) FROM hypopg_list_indexes();
SELECT hypopg_reset();

-- Partial index (smaller, faster for high-cardinality filter)
SELECT * FROM hypopg_create_index(
  'CREATE INDEX ON events(created_at) WHERE type = ''click'''
);
EXPLAIN SELECT * FROM events WHERE type = 'click' AND created_at > NOW() - INTERVAL '1 day';
SELECT hypopg_relation_size(indexrelid) FROM hypopg_list_indexes();
SELECT hypopg_reset();
-- Compare: partial index is typically 70-90% smaller

If both plans show an Index Scan with similar cost, take the partial index. You get the same query plan improvement at a fraction of the disk footprint and maintenance overhead.

Real-World Workflow: Index Impact Analysis Before Deployment

The most valuable workflow combines pg_stat_statements with hypopg to build an evidence-based index proposal before touching production.

sql
-- Step 1: Find slow queries from pg_stat_statements
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 1000  -- queries averaging > 1 second
ORDER BY total_exec_time DESC
LIMIT 5;

-- Step 2: Run EXPLAIN on the slow query
EXPLAIN SELECT o.*, u.email
FROM orders o JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending' AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC;
-- Shows Seq Scan (expensive)

-- Step 3: Create hypothetical indexes and test
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders(status, created_at DESC)');
SELECT * FROM hypopg_create_index('CREATE INDEX ON orders(user_id) WHERE status = ''pending''');

EXPLAIN SELECT o.*, u.email
FROM orders o JOIN users u ON u.id = o.user_id
WHERE o.status = 'pending' AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC;
-- Now shows Index Scan

-- Step 4: Compare costs
-- Step 5: Drop hypothetical indexes
SELECT hypopg_drop_index(indexrelid) FROM hypopg_list_indexes();
-- Or drop all at once:
SELECT hypopg_reset();

-- Step 6: Create the winning index for real
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders(status, created_at DESC);
hypopg is particularly valuable before running CREATE INDEX CONCURRENTLY on a 500GB table that takes 4 hours to build. Verify the planner will actually use your new index on the specific queries you care about before committing to the build time.

This workflow takes five minutes and produces hard numbers. You can present the EXPLAIN output as a before/after diff in a deployment proposal, giving reviewers confidence without requiring a staging environment with production-scale data.

hypopg vs EXPLAIN with Real Indexes — When Each Applies

hypopg is not a replacement for testing with real indexes. It is a fast first pass that filters out bad ideas before you invest real time. Here is when to use each approach:

Scenario Use hypopg Use real index
Table is > 100GB and index build takes hours Yes — validate first Only after hypopg confirms
Comparing 5 candidate index designs quickly Yes — iterate in seconds Too slow for iteration
Verifying runtime performance improvement No — no actual execution Yes — measure wall clock time
Checking planner cost estimate direction Yes — sufficient Overkill
CI/CD pre-deploy index validation Yes — low overhead May be impractical
Confirming index is actually used at runtime No Yes — use EXPLAIN ANALYZE

A healthy workflow uses both: hypopg to validate planner intent in minutes, then CREATE INDEX CONCURRENTLY followed by EXPLAIN ANALYZE on staging to confirm real execution behavior before production rollout.

Limitations

hypopg is purpose-built and transparent about its constraints. Understanding them prevents misuse:

  • EXPLAIN only, not EXPLAIN ANALYZE. Hypothetical indexes exist only in planner metadata. Any query that actually executes — including EXPLAIN ANALYZE — cannot use them. The planner will produce a cost estimate reflecting the hypothetical index but ANALYZE will execute against the real table without it, producing meaningless timing data.
  • No actual query execution. The only output is a cost estimate. You cannot measure real latency or throughput with hypothetical indexes. Use them for directional validation, not benchmarking.
  • Supported index types: B-tree, GiST, GIN, BRIN. Hash indexes and extension-provided index types (e.g., bloom) are not supported.
  • Session-scoped. All hypothetical indexes are dropped when the session ends. In pooled environments with short-lived connections, you may need to recreate them every time. Call hypopg_reset() within a session to clear them manually between tests.
  • Statistics dependency. hypopg relies on pg_statistic data for row count estimates. On tables with stale statistics (post-bulk-load or post-significant-churn), planner estimates may be inaccurate even with real indexes. Run ANALYZE before relying on hypopg results in those cases.
  • No write amplification simulation. hypopg does not model the overhead a real index would add to INSERT, UPDATE, and DELETE operations. Consider write workload separately when making the final decision.

Key Takeaways

  • hypopg lets the PostgreSQL planner evaluate an index that does not exist on disk — giving you cost estimates in seconds rather than after a multi-hour index build.
  • Use hypopg_create_index() with any valid CREATE INDEX DDL, including partial indexes and expression indexes. The planner treats them as real candidates during EXPLAIN.
  • Always use plain EXPLAIN, never EXPLAIN ANALYZE — ANALYZE executes the query and bypasses hypothetical indexes entirely.
  • hypopg_relation_size() estimates disk footprint before you commit, enabling direct comparisons between full and partial index designs.
  • The recommended workflow: identify slow queries via pg_stat_statements → test candidate indexes with hypopg → pick the winning design → build with CREATE INDEX CONCURRENTLY → verify with EXPLAIN ANALYZE on staging.
  • Hypothetical indexes are session-scoped and invisible to other connections. Call hypopg_reset() to clear them when done, or let them drop on disconnect.

Working with JusDB on PostgreSQL Index Optimization

Knowing that an index would help is the first step. Knowing which index design to pick — composite vs. partial, covering vs. filtered, B-tree vs. GIN — requires understanding your query patterns, data distribution, and write workload together. hypopg answers the planner question; the broader index strategy question takes deeper analysis.

At JusDB's PostgreSQL consulting practice, index optimization engagements typically start with a workload audit: mining pg_stat_statements for the highest-impact queries, profiling table statistics and bloat, and mapping index candidates to execution plans using exactly this kind of hypopg workflow. The result is an index proposal with documented evidence — cost comparisons, size estimates, and write overhead projections — that teams can review before any production DDL runs.

If your PostgreSQL cluster has slow queries, growing index bloat, or an upcoming schema migration involving large tables, get in touch. We can run a targeted index audit and deliver a prioritized set of changes with hypopg-backed validation for each one.

Related reading:

Share this article

JusDB Team

Official JusDB content team