TL;DR — PostgreSQL in 60 seconds: PostgreSQL is the world's most advanced open-source relational database, used in production at Apple, Reddit, Instagram, and most modern SaaS companies. Its core differentiators: MVCC (lock-free concurrent reads/writes), JSONB (indexed JSON columns rival document databases), extensions (PostGIS, pgvector, TimescaleDB, Citus), logical replication (CDC + zero-downtime migration), and a query optimizer that handles 100-table JOINs better than any open-source competitor. PostgreSQL 17 (released Sept 2026) adds incremental backups, JSON_TABLE() support, and per-query memory limits. PostgreSQL 16 hit 2× write throughput vs 15. Use it for: OLTP, mixed OLTP+analytics, geospatial, vector search, time-series (via TimescaleDB). Avoid only for: write-heavy single-table analytics at huge scale (use ClickHouse).
In 2024, Stack Overflow's annual developer survey found PostgreSQL the most-used database for the third year running — ahead of MySQL, MongoDB, SQLite, and Redis. The reason isn't marketing. PostgreSQL has quietly become the default choice for teams that need a serious relational database: full ACID transactions, a query optimizer that actually works, JSON support that competes with MongoDB, vector search via pgvector, and extensions that turn it into a time-series DB, geospatial engine, or column store. All in one system, maintained by a community that's been doing this since 1996.
This guide covers what PostgreSQL actually does, how its internals work, and how to make the decisions that matter — storage, connections, replication, and when to choose something else.
- PostgreSQL is a full-featured ACID-compliant relational database — the safest default for most new applications
- MVCC means reads never block writes; autovacuum reclaims dead tuples from updates/deletes
- Streaming replication is built-in; logical replication enables selective table sync and zero-downtime upgrades
- Extensions (pgvector, TimescaleDB, PostGIS, pg_partman) add capabilities without switching databases
- Vertical scaling gets you very far — a well-tuned PostgreSQL on 64 vCPU / 512GB RAM handles most production workloads
MVCC: How PostgreSQL Handles Concurrent Access
PostgreSQL uses Multi-Version Concurrency Control (MVCC). When you update or delete a row, PostgreSQL doesn't modify it in place — it writes a new version and marks the old one as dead. This means:
- Reads never block writes — a long-running SELECT doesn't hold locks that prevent INSERTs or UPDATEs
- Consistent snapshots — a transaction sees a consistent view of the database as of when it started, regardless of concurrent writes
- Dead tuples accumulate — autovacuum runs in the background to reclaim space from dead row versions
-- See MVCC in action: transaction isolation BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- This snapshot is frozen at transaction start SELECT count(*) FROM orders; -- returns 10,000 -- Even if another session inserts 100 rows here, this transaction still sees 10,000 COMMIT; -- Isolation levels: -- READ COMMITTED (default): sees committed data as of each statement -- REPEATABLE READ: consistent snapshot for the entire transaction -- SERIALIZABLE: full serializability — transactions behave as if run sequentially SET default_transaction_isolation = 'repeatable read';
Storage: Tables, Indexes, and TOAST
PostgreSQL stores table data in 8KB pages on disk. Each row has a header (transaction visibility info) plus the column data. Large values (>2KB by default) are automatically compressed and stored in a separate TOAST table, keeping the main table pages compact.
-- Check table and index sizes SELECT relname, pg_size_pretty(pg_table_size(oid)) AS table_size, pg_size_pretty(pg_indexes_size(oid)) AS indexes_size, pg_size_pretty(pg_total_relation_size(oid)) AS total_size FROM pg_class WHERE relkind = 'r' AND relnamespace = 'public'::regnamespace ORDER BY pg_total_relation_size(oid) DESC; -- Index types available: -- B-tree (default): equality, range queries, ORDER BY, most use cases CREATE INDEX orders_created_idx ON orders(created_at); -- Partial index: index only a subset of rows (faster builds, smaller size) CREATE INDEX orders_pending_idx ON orders(created_at) WHERE status = 'pending'; -- Covering index: include extra columns to enable index-only scans CREATE INDEX orders_user_covering ON orders(user_id) INCLUDE (status, total); -- GIN index: for JSONB, full-text search, array contains queries CREATE INDEX products_attrs_idx ON products USING GIN(attributes jsonb_path_ops); -- BRIN index: for naturally ordered data (timestamps, auto-increment IDs) — tiny size CREATE INDEX events_ts_brin ON events USING BRIN(recorded_at); -- Build indexes concurrently (no write lock) CREATE INDEX CONCURRENTLY orders_status_idx ON orders(status);
Replication: Streaming vs Logical
Streaming replication (physical)
The standard PostgreSQL HA setup: the primary streams WAL (write-ahead log) to one or more standby servers in real time. Standbys are byte-for-byte copies of the primary — same PostgreSQL version, same data directory layout.
-- postgresql.conf on primary wal_level = replica max_wal_senders = 5 wal_keep_size = 1GB # retain WAL for standbys that lag -- Create replication user CREATE USER replicator REPLICATION LOGIN PASSWORD 'secret'; -- pg_hba.conf on primary host replication replicator standby-ip/32 scram-sha-256 -- Set up standby (run on standby server): pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/data --wal-method=stream -- Recovery config on standby (postgresql.conf): primary_conninfo = 'host=primary-host user=replicator password=secret' hot_standby = on -- allow read queries on standby -- Check replication lag SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, (sent_lsn - replay_lsn) AS replication_lag_bytes FROM pg_stat_replication;
Logical replication
Logical replication copies individual tables (not the whole cluster) and works across different PostgreSQL versions. Use it for: zero-downtime major version upgrades, selective table sync, and feeding data to analytics systems.
-- Publisher (source): ALTER SYSTEM SET wal_level = logical; SELECT pg_reload_conf(); CREATE PUBLICATION my_pub FOR TABLE orders, users; -- Subscriber (target): CREATE SUBSCRIPTION my_sub CONNECTION 'host=primary-host dbname=mydb user=replicator password=secret' PUBLICATION my_pub; -- Check subscription status SELECT * FROM pg_stat_subscription; -- lag_bytes should stay near 0 for a healthy subscription
JSON and JSONB
PostgreSQL's JSONB type stores JSON in a binary format that supports indexing and efficient querying. For semi-structured data that doesn't need MongoDB's horizontal scale, JSONB is a compelling alternative that keeps your data in the same database as everything else.
-- Store and query JSONB
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
attributes JSONB
);
INSERT INTO products (name, attributes) VALUES
('Widget Pro', '{"color": "blue", "weight_kg": 0.5, "tags": ["sale", "new"]}');
-- Query specific fields
SELECT name, attributes->>'color' AS color FROM products;
-- Filter on nested values (uses GIN index if present)
SELECT * FROM products WHERE attributes @> '{"tags": ["sale"]}';
SELECT * FROM products WHERE (attributes->>'weight_kg')::float < 1.0;
-- JSONB aggregation
SELECT
attributes->>'color' AS color,
COUNT(*) AS count
FROM products
GROUP BY attributes->>'color';
-- GIN index for contains queries
CREATE INDEX products_attrs_gin ON products USING GIN(attributes);
Key Extensions
-- pgvector: vector similarity search for AI/ML applications
CREATE EXTENSION vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
CREATE INDEX doc_embedding_idx ON documents USING HNSW(embedding vector_cosine_ops);
-- TimescaleDB: time-series with automatic partitioning and compression
CREATE EXTENSION timescaledb;
SELECT create_hypertable('sensor_readings', 'recorded_at');
-- PostGIS: geospatial queries (distances, polygons, spatial indexes)
CREATE EXTENSION postgis;
SELECT ST_Distance(location, ST_MakePoint(-73.9857, 40.7484)) AS distance_meters
FROM venues
ORDER BY distance_meters LIMIT 10;
-- pg_partman: automatic table partitioning management
CREATE EXTENSION pg_partman;
SELECT create_parent('public.events', 'created_at', 'native', 'monthly');
-- pg_stat_statements: query performance tracking (install on all production instances)
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;
Essential Configuration for Production
# postgresql.conf — starting point for a dedicated DB server with 64GB RAM # Memory shared_buffers = 16GB # 25% of RAM effective_cache_size = 48GB # 75% of RAM (planner hint, not allocated) work_mem = 64MB # per sort/hash operation (×connections × operations) maintenance_work_mem = 2GB # for VACUUM, index builds # WAL and checkpoints wal_buffers = 64MB checkpoint_completion_target = 0.9 max_wal_size = 4GB min_wal_size = 1GB # Connections max_connections = 200 # use PgBouncer for connection pooling above ~200 connection_pooling via PgBouncer # transaction-mode pooling for web workloads # Query planner random_page_cost = 1.1 # for SSD (default 4.0 assumes HDD) effective_io_concurrency = 200 # for SSD # Autovacuum (tune for high-write tables) autovacuum_vacuum_scale_factor = 0.05 # trigger at 5% dead tuples (default 20%) autovacuum_vacuum_cost_delay = 2ms # reduce for SSD
When PostgreSQL Is the Right Choice
PostgreSQL is the right default for nearly every new application. The cases where you'd choose something else:
- Need to write 500K+ rows/sec across multiple regions → Cassandra or TiDB
- Pure key-value at sub-millisecond latency → Redis or Aerospike
- OLAP over billions of rows with complex aggregations → ClickHouse or StarRocks
- Highly variable document structure that changes constantly → MongoDB
For everything else — transactional apps, APIs, SaaS backends, reporting, vector search under 50M vectors, geospatial — PostgreSQL handles it, often better than the specialized alternative.
Working with JusDB on PostgreSQL
PostgreSQL is deceptively easy to get started with and deceptively hard to operate at scale. The defaults work fine for development but underperform in production: shared_buffers too small, work_mem too low, autovacuum too conservative. Connection pooling, replication setup, and major version upgrades all have real gotchas.
We tune, operate, and upgrade PostgreSQL clusters as part of our PostgreSQL consulting and managed database SRE services. If you're starting a new PostgreSQL deployment, optimizing an existing one, or planning a major version upgrade, reach out.
Related reading: PostgreSQL VACUUM Tuning | PostgreSQL as a Vector Database | PostgreSQL 18 What's New
Frequently Asked Questions
What is PostgreSQL and what is it best for?
PostgreSQL vs MySQL — which should I choose in 2026?
What is MVCC in PostgreSQL?
Is JSONB fast enough to replace MongoDB?
@> (contains) and @@ (jsonpath match) are index-accelerated and competitive with MongoDB. PostgreSQL adds JOINs, ACID, and full SQL on top. MongoDB still wins on: sharded write throughput >100k/s, schema-on-read where schemas vary wildly per document. PostgreSQL+JSONB wins everywhere else.