PostgreSQL Explained 2026: Architecture, MVCC, JSONB & Patterns

Q: PostgreSQL vs MySQL — which should I choose in 2026?

PostgreSQL wins on: complex queries, JSONB performance, extensions (PostGIS, pgvector), MVCC concurrency, strict SQL standard compliance, advanced indexing (GiST, GIN, BRIN). MySQL wins on: simple read-heavy CRUD, replication tooling maturity, ecosystem familiarity. For new projects in 2026, default to PostgreSQL unless you have a specific reason — its momentum and feature velocity are unmatched.

Q: What's new in PostgreSQL 16 and 17?

PG 16 (Sept 2025): logical replication from standbys, parallel hash JOIN improvements (2-3× faster), pg_stat_io for I/O observability, SQL/JSON constructors. PG 17 (Sept 2026): JSON_TABLE() for relational-style JSON queries, incremental backups via pg_basebackup, per-query memory limit (work_mem_per_query), improved vacuum performance on large indexes, native MERGE / RETURNING combinations.

Q: How do I scale PostgreSQL beyond a single node?

Three patterns: (1) Read replicas (streaming replication) for read scale. (2) Logical replication for selective table sync, zero-downtime upgrades, and CDC. (3) Sharding via Citus (extension, native PG) for write scale. Past 10TB: consider Aurora PostgreSQL or partitioning + Citus. Avoid homegrown sharding — it's almost always wrong.

Q: Should I use Aurora PostgreSQL, RDS, or self-managed?

RDS PostgreSQL : fully managed, EBS-backed, predictable cost, vanilla PG. Aurora PostgreSQL : distributed storage layer, 5× write throughput, instant snapshots, but 20-50% more expensive. Self-managed on EC2 : full control, lower cost at scale (>32 vCPUs), but you own everything. Rule: RDS until you hit p99 latency walls; Aurora for write-heavy multi-AZ; self-managed only with a dedicated DBA team.

Q: What's the best PostgreSQL HA solution?

Patroni + etcd/Consul for self-hosted: automated failover in 5-15s, battle-tested by GitLab, Zalando, and the Patroni community. repmgr : simpler but less feature-complete. RDS Multi-AZ : managed, ~60s failover, no manual config. Aurora : sub-30s failover via storage-level replication. For new self-managed deployments in 2026, Patroni is the default; we run it for 500+ clients.

TL;DR — PostgreSQL in 60 seconds: PostgreSQL is the world's most advanced open-source relational database, used in production at Apple, Reddit, Instagram, and most modern SaaS companies. Its core differentiators: MVCC (lock-free concurrent reads/writes), JSONB (indexed JSON columns rival document databases), extensions (PostGIS, pgvector, TimescaleDB, Citus), logical replication (CDC + zero-downtime migration), and a query optimizer that handles 100-table JOINs better than any open-source competitor. PostgreSQL 17 (released Sept 2026) adds incremental backups, JSON_TABLE() support, and per-query memory limits. PostgreSQL 16 hit 2× write throughput vs 15. Use it for: OLTP, mixed OLTP+analytics, geospatial, vector search, time-series (via TimescaleDB). Avoid only for: write-heavy single-table analytics at huge scale (use ClickHouse).

In 2024, Stack Overflow's annual developer survey found PostgreSQL the most-used database for the third year running — ahead of MySQL, MongoDB, SQLite, and Redis. The reason isn't marketing. PostgreSQL has quietly become the default choice for teams that need a serious relational database: full ACID transactions, a query optimizer that actually works, JSON support that competes with MongoDB, vector search via pgvector, and extensions that turn it into a time-series DB, geospatial engine, or column store. All in one system, maintained by a community that's been doing this since 1996.

This guide covers what PostgreSQL actually does, how its internals work, and how to make the decisions that matter — storage, connections, replication, and when to choose something else.

TL;DR

PostgreSQL is a full-featured ACID-compliant relational database — the safest default for most new applications
MVCC means reads never block writes; autovacuum reclaims dead tuples from updates/deletes
Streaming replication is built-in; logical replication enables selective table sync and zero-downtime upgrades
Extensions (pgvector, TimescaleDB, PostGIS, pg_partman) add capabilities without switching databases
Vertical scaling gets you very far — a well-tuned PostgreSQL on 64 vCPU / 512GB RAM handles most production workloads

MVCC: How PostgreSQL Handles Concurrent Access

PostgreSQL uses Multi-Version Concurrency Control (MVCC). When you update or delete a row, PostgreSQL doesn't modify it in place — it writes a new version and marks the old one as dead. This means:

Reads never block writes — a long-running SELECT doesn't hold locks that prevent INSERTs or UPDATEs
Consistent snapshots — a transaction sees a consistent view of the database as of when it started, regardless of concurrent writes
Dead tuples accumulate — autovacuum runs in the background to reclaim space from dead row versions

text

-- See MVCC in action: transaction isolation
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- This snapshot is frozen at transaction start
SELECT count(*) FROM orders;  -- returns 10,000

-- Even if another session inserts 100 rows here, this transaction still sees 10,000
COMMIT;

-- Isolation levels:
-- READ COMMITTED (default): sees committed data as of each statement
-- REPEATABLE READ: consistent snapshot for the entire transaction
-- SERIALIZABLE: full serializability — transactions behave as if run sequentially
SET default_transaction_isolation = 'repeatable read';

Storage: Tables, Indexes, and TOAST

PostgreSQL stores table data in 8KB pages on disk. Each row has a header (transaction visibility info) plus the column data. Large values (>2KB by default) are automatically compressed and stored in a separate TOAST table, keeping the main table pages compact.

text

-- Check table and index sizes
SELECT
  relname,
  pg_size_pretty(pg_table_size(oid)) AS table_size,
  pg_size_pretty(pg_indexes_size(oid)) AS indexes_size,
  pg_size_pretty(pg_total_relation_size(oid)) AS total_size
FROM pg_class
WHERE relkind = 'r' AND relnamespace = 'public'::regnamespace
ORDER BY pg_total_relation_size(oid) DESC;

-- Index types available:
-- B-tree (default): equality, range queries, ORDER BY, most use cases
CREATE INDEX orders_created_idx ON orders(created_at);

-- Partial index: index only a subset of rows (faster builds, smaller size)
CREATE INDEX orders_pending_idx ON orders(created_at) WHERE status = 'pending';

-- Covering index: include extra columns to enable index-only scans
CREATE INDEX orders_user_covering ON orders(user_id) INCLUDE (status, total);

-- GIN index: for JSONB, full-text search, array contains queries
CREATE INDEX products_attrs_idx ON products USING GIN(attributes jsonb_path_ops);

-- BRIN index: for naturally ordered data (timestamps, auto-increment IDs) — tiny size
CREATE INDEX events_ts_brin ON events USING BRIN(recorded_at);

-- Build indexes concurrently (no write lock)
CREATE INDEX CONCURRENTLY orders_status_idx ON orders(status);

Replication: Streaming vs Logical

Streaming replication (physical)

The standard PostgreSQL HA setup: the primary streams WAL (write-ahead log) to one or more standby servers in real time. Standbys are byte-for-byte copies of the primary — same PostgreSQL version, same data directory layout.

text

-- postgresql.conf on primary
wal_level = replica
max_wal_senders = 5
wal_keep_size = 1GB      # retain WAL for standbys that lag

-- Create replication user
CREATE USER replicator REPLICATION LOGIN PASSWORD 'secret';

-- pg_hba.conf on primary
host replication replicator standby-ip/32 scram-sha-256

-- Set up standby (run on standby server):
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/data --wal-method=stream

-- Recovery config on standby (postgresql.conf):
primary_conninfo = 'host=primary-host user=replicator password=secret'
hot_standby = on  -- allow read queries on standby

-- Check replication lag
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  (sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

Logical replication

Logical replication copies individual tables (not the whole cluster) and works across different PostgreSQL versions. Use it for: zero-downtime major version upgrades, selective table sync, and feeding data to analytics systems.

text

-- Publisher (source):
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();

CREATE PUBLICATION my_pub FOR TABLE orders, users;

-- Subscriber (target):
CREATE SUBSCRIPTION my_sub
  CONNECTION 'host=primary-host dbname=mydb user=replicator password=secret'
  PUBLICATION my_pub;

-- Check subscription status
SELECT * FROM pg_stat_subscription;
-- lag_bytes should stay near 0 for a healthy subscription

JSON and JSONB

PostgreSQL's JSONB type stores JSON in a binary format that supports indexing and efficient querying. For semi-structured data that doesn't need MongoDB's horizontal scale, JSONB is a compelling alternative that keeps your data in the same database as everything else.

text

-- Store and query JSONB
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  attributes JSONB
);

INSERT INTO products (name, attributes) VALUES
  ('Widget Pro', '{"color": "blue", "weight_kg": 0.5, "tags": ["sale", "new"]}');

-- Query specific fields
SELECT name, attributes->>'color' AS color FROM products;

-- Filter on nested values (uses GIN index if present)
SELECT * FROM products WHERE attributes @> '{"tags": ["sale"]}';
SELECT * FROM products WHERE (attributes->>'weight_kg')::float < 1.0;

-- JSONB aggregation
SELECT
  attributes->>'color' AS color,
  COUNT(*) AS count
FROM products
GROUP BY attributes->>'color';

-- GIN index for contains queries
CREATE INDEX products_attrs_gin ON products USING GIN(attributes);

Key Extensions

text

-- pgvector: vector similarity search for AI/ML applications
CREATE EXTENSION vector;
ALTER TABLE documents ADD COLUMN embedding vector(1536);
CREATE INDEX doc_embedding_idx ON documents USING HNSW(embedding vector_cosine_ops);

-- TimescaleDB: time-series with automatic partitioning and compression
CREATE EXTENSION timescaledb;
SELECT create_hypertable('sensor_readings', 'recorded_at');

-- PostGIS: geospatial queries (distances, polygons, spatial indexes)
CREATE EXTENSION postgis;
SELECT ST_Distance(location, ST_MakePoint(-73.9857, 40.7484)) AS distance_meters
FROM venues
ORDER BY distance_meters LIMIT 10;

-- pg_partman: automatic table partitioning management
CREATE EXTENSION pg_partman;
SELECT create_parent('public.events', 'created_at', 'native', 'monthly');

-- pg_stat_statements: query performance tracking (install on all production instances)
CREATE EXTENSION pg_stat_statements;
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;

Essential Configuration for Production

text

# postgresql.conf — starting point for a dedicated DB server with 64GB RAM

# Memory
shared_buffers = 16GB              # 25% of RAM
effective_cache_size = 48GB        # 75% of RAM (planner hint, not allocated)
work_mem = 64MB                    # per sort/hash operation (×connections × operations)
maintenance_work_mem = 2GB         # for VACUUM, index builds

# WAL and checkpoints
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# Connections
max_connections = 200              # use PgBouncer for connection pooling above ~200
connection_pooling via PgBouncer   # transaction-mode pooling for web workloads

# Query planner
random_page_cost = 1.1             # for SSD (default 4.0 assumes HDD)
effective_io_concurrency = 200     # for SSD

# Autovacuum (tune for high-write tables)
autovacuum_vacuum_scale_factor = 0.05   # trigger at 5% dead tuples (default 20%)
autovacuum_vacuum_cost_delay = 2ms      # reduce for SSD

When PostgreSQL Is the Right Choice

PostgreSQL is the right default for nearly every new application. The cases where you'd choose something else:

Need to write 500K+ rows/sec across multiple regions → Cassandra or TiDB
Pure key-value at sub-millisecond latency → Redis or Aerospike
OLAP over billions of rows with complex aggregations → ClickHouse or StarRocks
Highly variable document structure that changes constantly → MongoDB

For everything else — transactional apps, APIs, SaaS backends, reporting, vector search under 50M vectors, geospatial — PostgreSQL handles it, often better than the specialized alternative.

Working with JusDB on PostgreSQL

PostgreSQL is deceptively easy to get started with and deceptively hard to operate at scale. The defaults work fine for development but underperform in production: shared_buffers too small, work_mem too low, autovacuum too conservative. Connection pooling, replication setup, and major version upgrades all have real gotchas.

We tune, operate, and upgrade PostgreSQL clusters as part of our PostgreSQL consulting and managed database SRE services. If you're starting a new PostgreSQL deployment, optimizing an existing one, or planning a major version upgrade, reach out.

Frequently Asked Questions

What is PostgreSQL and what is it best for?

PostgreSQL is an open-source object-relational database with strong ACID guarantees, rich SQL features, MVCC concurrency, and an extension ecosystem (PostGIS, pgvector, TimescaleDB, Citus). Best for OLTP applications, mixed OLTP/analytics workloads, geospatial data, vector search, multi-tenant SaaS, and replacing Oracle. Used in production at Apple, Reddit, Instagram, and most modern SaaS companies.

PostgreSQL vs MySQL — which should I choose in 2026?

PostgreSQL wins on: complex queries, JSONB performance, extensions (PostGIS, pgvector), MVCC concurrency, strict SQL standard compliance, advanced indexing (GiST, GIN, BRIN). MySQL wins on: simple read-heavy CRUD, replication tooling maturity, ecosystem familiarity. For new projects in 2026, default to PostgreSQL unless you have a specific reason — its momentum and feature velocity are unmatched.

What is MVCC in PostgreSQL?

MVCC (Multi-Version Concurrency Control) lets readers and writers operate without blocking each other. Each transaction sees a snapshot of the data; writes create new row versions instead of locking. Tombstone rows are cleaned by VACUUM. The cost is bloat — long-running transactions block VACUUM, accumulating dead tuples. Tuning autovacuum is the #1 PostgreSQL production skill.

Is JSONB fast enough to replace MongoDB?

For most workloads, yes. JSONB stores parsed binary JSON with GIN indexes on keys/values. Operators like @> (contains) and @@ (jsonpath match) are index-accelerated and competitive with MongoDB. PostgreSQL adds JOINs, ACID, and full SQL on top. MongoDB still wins on: sharded write throughput >100k/s, schema-on-read where schemas vary wildly per document. PostgreSQL+JSONB wins everywhere else.

What's new in PostgreSQL 16 and 17?

PG 16 (Sept 2025): logical replication from standbys, parallel hash JOIN improvements (2-3× faster), pg_stat_io for I/O observability, SQL/JSON constructors. PG 17 (Sept 2026): JSON_TABLE() for relational-style JSON queries, incremental backups via pg_basebackup, per-query memory limit (work_mem_per_query), improved vacuum performance on large indexes, native MERGE / RETURNING combinations.

How do I scale PostgreSQL beyond a single node?

Three patterns: (1) Read replicas (streaming replication) for read scale. (2) Logical replication for selective table sync, zero-downtime upgrades, and CDC. (3) Sharding via Citus (extension, native PG) for write scale. Past 10TB: consider Aurora PostgreSQL or partitioning + Citus. Avoid homegrown sharding — it's almost always wrong.

Should I use Aurora PostgreSQL, RDS, or self-managed?

RDS PostgreSQL: fully managed, EBS-backed, predictable cost, vanilla PG. Aurora PostgreSQL: distributed storage layer, 5× write throughput, instant snapshots, but 20-50% more expensive. Self-managed on EC2: full control, lower cost at scale (>32 vCPUs), but you own everything. Rule: RDS until you hit p99 latency walls; Aurora for write-heavy multi-AZ; self-managed only with a dedicated DBA team.

What's the best PostgreSQL HA solution?

Patroni + etcd/Consul for self-hosted: automated failover in 5-15s, battle-tested by GitLab, Zalando, and the Patroni community. repmgr: simpler but less feature-complete. RDS Multi-AZ: managed, ~60s failover, no manual config. Aurora: sub-30s failover via storage-level replication. For new self-managed deployments in 2026, Patroni is the default; we run it for 500+ clients.

Share this article

JusDB Team

Official JusDB content team

Keep reading

PostgreSQL

PostgreSQL 19 Beta: Every New Feature That Matters to DBAs

PostgreSQL 19 Beta 1 (June 4, 2026) brings parallel autovacuum, the native REPACK command for online table rebuilds, 2x faster inserts under foreign-key load, online logical replication without a restart, WAIT FOR LSN for read-your-writes consistency, and default changes (JIT off, lz4 TOAST, RADIUS removed). A DBA-focused walkthrough of what changed and what to test before GA.

PostgreSQL14 minJun 15, 2026

Read

PostgreSQL

PostgreSQL Performance Tuning Playbook: A Top-Down Method for Faster Queries

A repeatable, top-down method for tuning PostgreSQL: measure with pg_stat_statements, read plans with EXPLAIN (ANALYZE, BUFFERS), fix queries and indexes before parameters, then tune memory, I/O, WAL, connection pooling, and autovacuum — with a ready-to-adapt postgresql.conf baseline.

PostgreSQL16 minMay 31, 2026

Read

PostgreSQL

PostgreSQL Architecture Deep Dive: Process Model, MVCC, WAL & Replication Explained

Walk through PostgreSQL's multi-process architecture, shared/local memory layout, page-organized storage, MVCC tuple versioning, the WAL write path, the query execution pipeline, and physical + logical replication — all with ASCII flow diagrams that show how data and control actually move through the system.

PostgreSQL18 minMay 31, 2026

Read

Need Expert Help?

Need expert help? Our specialists can assist with these related services:

PostgreSQL Consulting

Architecture reviews, migration planning, performance tuning & database health checks

Learn more

PostgreSQL Migration

Oracle → PG, MySQL → PG, version upgrades 14→17, zero-downtime cutovers

Learn more

PostgreSQL Support

24/7 PostgreSQL support: VACUUM tuning, WAL/replication issues, 15-min critical-incident SLA

Learn more

PostgreSQL High Availability

Patroni + streaming replication, 99.99% uptime, sub-30s automatic failover

Learn more

PostgreSQL Cloud Migration

AWS RDS/Aurora, GCP Cloud SQL, Azure Flexible Server - on-prem to cloud with zero downtime

Learn more

PostgreSQL on Kubernetes

CloudNativePG, Zalando operator, StatefulSet design, Patroni inside K8s

Learn more

Explore all our database services

PostgreSQL Explained (2026): Architecture, MVCC, JSONB & Production Patterns

MVCC: How PostgreSQL Handles Concurrent Access

Storage: Tables, Indexes, and TOAST

Replication: Streaming vs Logical

Streaming replication (physical)

Logical replication

JSON and JSONB

Key Extensions

Essential Configuration for Production

When PostgreSQL Is the Right Choice

Working with JusDB on PostgreSQL

Frequently Asked Questions

Share this article

JusDB Team

Keep reading

PostgreSQL 19 Beta: Every New Feature That Matters to DBAs

PostgreSQL Performance Tuning Playbook: A Top-Down Method for Faster Queries

PostgreSQL Architecture Deep Dive: Process Model, MVCC, WAL & Replication Explained

Need Expert Help?

PostgreSQL Consulting

PostgreSQL Migration

PostgreSQL Support

PostgreSQL High Availability

PostgreSQL Cloud Migration

PostgreSQL on Kubernetes