NoSQL Databases

MongoDB vs PostgreSQL in 2026: Choosing the Right Database for Your Data Model

An honest comparison of MongoDB and PostgreSQL in 2026 — document model vs relational model, JSONB as a document store, ACID transaction trade-offs, horizontal scaling, licensing, and migration paths.

JusDB Team
February 10, 2025
12 min read
154 views

In the spring of 2023, an e-commerce platform running MySQL 8.0 deployed what their engineering team described as a "routine schema change" — adding a NOT NULL column with no default value to a 200-million-row orders table. The ALTER TABLE statement acquired a metadata lock, blocked every incoming write, and held it for 45 minutes while MySQL rebuilt the table in place. Revenue-impacting checkout errors started within 30 seconds. The on-call engineer killed the query at the 45-minute mark and discovered the operation had to be rolled back and restarted from scratch. Total impact: 47 minutes of degraded service, $180,000 in lost transactions, and a post-mortem that consumed the entire next sprint.

This is not an edge case. Database migrations are one of the most reliable sources of unplanned production downtime, yet the patterns that prevent it are well-established and, once understood, not difficult to implement. The problem is that most engineering teams learn these patterns reactively — after their first painful incident — rather than proactively building them into their deployment workflow.

This guide covers the mechanisms behind migration-induced downtime, the expand-contract approach that eliminates it, specific safe and unsafe migration patterns, and the tooling ecosystem that automates the hard parts. Everything here is applicable to teams running PostgreSQL or MySQL in production today.

TL;DR
  • ALTER TABLE acquires an ACCESS EXCLUSIVE lock in PostgreSQL and a metadata lock in MySQL, blocking all reads and writes for the duration of the table rebuild — this is the root cause of migration downtime.
  • The Expand-Contract pattern (also called blue-green schema migration) decouples schema changes from deployments by splitting each change into three backward-compatible phases: add, backfill, activate.
  • Adding a nullable column and adding an index CONCURRENTLY (PostgreSQL) or with ALGORITHM=INPLACE (MySQL 8.0) are online operations that do not block reads or writes.
  • Adding NOT NULL without a default, renaming a column, and changing a column type are the three most dangerous migration patterns — each requires a multi-phase approach with at least one full deploy cycle between phases.
  • pt-online-schema-change and gh-ost enable online table rebuilds for MySQL by copying rows to a shadow table and replaying changes via triggers or the binary log.
  • Schema versioning tools (Flyway, Liquibase, Alembic, golang-migrate) enforce migration ordering, track applied versions in the database, and integrate with CI/CD pipelines to prevent drift between environments.

Background

Every production database migration is a coordination problem between two moving parts: the schema version the database expects and the schema version the application code expects. During a rolling deploy, both versions of your application are running simultaneously — the old version and the new version. If a migration breaks the old code path, your rolling deploy either stalls or causes errors until every instance of the old code is shut down. If the migration takes a lock, your database stops serving requests until the lock is released.

The foundational insight behind zero-downtime migrations is that schema changes and application deployments are separate concerns that must be deliberately sequenced. The naive approach — run the migration as part of the deploy — conflates them. The correct approach treats migrations as infrastructure changes with their own lifecycle: plan, execute in phases, validate, clean up.

This requires accepting a few constraints. During the transition period between phases, both old and new application code must be able to run against the same schema without errors. This means old columns cannot be dropped until no code references them, new columns must be nullable or have defaults until all code writes to them, and renamed columns must exist under both names simultaneously for at least one full deploy cycle.

These constraints sound burdensome but become mechanical once you internalize the patterns. Most schema changes that appear atomic can be decomposed into a sequence of backward-compatible steps, each deployable independently with no downtime risk.

Why Migrations Cause Downtime

Understanding the locking mechanisms in PostgreSQL and MySQL explains exactly which operations are safe and which are not.

PostgreSQL: ACCESS EXCLUSIVE Lock

PostgreSQL uses a multi-level lock system. Most ALTER TABLE operations that modify column definitions, add constraints, or change types acquire an ACCESS EXCLUSIVE lock — the most restrictive lock level, incompatible with every other lock including ACCESS SHARE (which is held by every running SELECT).

The sequence is: PostgreSQL waits for all current transactions on the table to complete, acquires the ACCESS EXCLUSIVE lock, performs the table rewrite or catalog update, and releases the lock. During the wait-and-hold phase, every new query against the table queues behind the lock. On a busy table with long-running transactions — OLAP queries, slow API calls — this wait can stretch from seconds to minutes. During the actual table rewrite on a large table, it can stretch to hours.

sql
-- This acquires ACCESS EXCLUSIVE and rebuilds the table on disk.
-- On a 50M-row table, expect minutes to hours of lock hold time.
ALTER TABLE orders ADD COLUMN shipping_provider TEXT NOT NULL DEFAULT 'fedex';

-- Check what locks are currently held and waiting:
SELECT
    pid,
    wait_event_type,
    wait_event,
    state,
    query,
    now() - query_start AS duration
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY duration DESC;
Warning

In PostgreSQL, even a fast ALTER TABLE operation that only updates the system catalog (not a table rewrite) can cause visible downtime if there are long-running transactions on the table. The ALTER TABLE queues behind those transactions and blocks all subsequent queries while it waits. A 10ms catalog update can cause minutes of application-visible latency if it gets stuck waiting for a slow transaction to commit.

MySQL: Metadata Lock

MySQL uses a metadata lock (MDL) system introduced in MySQL 5.5. Any DDL statement that modifies a table acquires an exclusive MDL for the duration of the operation. Exclusive MDLs are incompatible with all other MDLs — including the shared MDL held by any active SELECT, INSERT, UPDATE, or DELETE against the table.

MySQL's InnoDB engine also introduced the Online DDL framework (full coverage in 5.6, significantly improved in 8.0). Online DDL allows certain ALTER TABLE operations to proceed concurrently with reads and writes while the table rebuild happens in the background. However, it still acquires a brief exclusive MDL at the start and end of the operation, and the definition of "certain operations" requires careful verification — the supported matrix varies by operation type, storage engine, and column data type.

sql
-- Check the Online DDL support for a given operation before running it:
-- ALGORITHM=INPLACE means row data is modified in place without a full copy.
-- ALGORITHM=COPY means a full table copy is made — always blocking.
-- LOCK=NONE means concurrent DML (reads and writes) is allowed.

ALTER TABLE orders
    ADD COLUMN shipping_provider VARCHAR(100) NULL,
    ALGORITHM=INPLACE, LOCK=NONE;

-- Verify the operation is truly online before running on the real table:
-- Check MySQL documentation for the specific operation's Online DDL support.
-- Not all ALGORITHM=INPLACE operations support LOCK=NONE.
Important

MySQL's Online DDL with ALGORITHM=INPLACE, LOCK=NONE does not guarantee zero impact. During a large table rebuild, InnoDB accumulates DML changes in an online DDL log. If the log fills up before the rebuild finishes (controlled by innodb_online_alter_log_max_size, default 128MB), the operation fails and must restart from scratch — after having held DML for seconds at both the start and end of the aborted operation.

Expand-Contract Pattern

The Expand-Contract pattern, also described as parallel change or blue-green schema migration, is the core framework for zero-downtime schema evolution. It was formalized by Martin Fowler and has become standard practice at high-availability teams at companies running MySQL and PostgreSQL at scale.

The pattern decomposes every schema change into three sequential phases, each deployed independently:

  1. Expand: Add the new schema element in a backward-compatible way. The old application code still runs correctly. No existing behavior changes.
  2. Migrate: Update application code to use the new schema element. Backfill existing data. Both old and new code paths coexist safely.
  3. Contract: Remove the old schema element. By this phase, no running code references the old element.

The critical rule: each phase requires a full, stable deployment before the next phase begins. You cannot compress phases into a single deploy. This means a schema change that would traditionally be a single ALTER TABLE now requires three separate deployments spread over days or weeks. This feels slow. It is also the only approach that eliminates downtime risk on large production tables.

Example: Renaming a Column

Renaming orders.customer_name to orders.full_name seems trivial. In production with a rolling deploy, it is one of the most dangerous operations possible. The correct sequence:

Phase 1 — Expand: Add the new column. Update application code to write to both columns. Deploy and stabilize.

sql
-- Phase 1: Add new column (nullable, no default required)
ALTER TABLE orders ADD COLUMN full_name TEXT;

Phase 2 — Migrate: Backfill historical data. Verify completeness. Update all application reads to use the new column. Deploy and stabilize.

sql
-- Phase 2: Backfill existing rows in batches (do NOT use a single UPDATE on a large table)
UPDATE orders SET full_name = customer_name
WHERE full_name IS NULL
  AND id BETWEEN 1 AND 100000;
-- Repeat in batches until all rows are backfilled.

-- After backfill, verify no nulls remain:
SELECT COUNT(*) FROM orders WHERE full_name IS NULL AND customer_name IS NOT NULL;

Phase 3 — Contract: Drop the old column. Only safe after no running application code references customer_name.

sql
-- Phase 3: Drop old column
ALTER TABLE orders DROP COLUMN customer_name;
Tip

Batch your backfill updates. A single UPDATE orders SET full_name = customer_name on a 100M-row table will hold a transaction open for potentially hours, generate enormous WAL/binlog, and block VACUUM (PostgreSQL) or slow replication (MySQL). Instead, update in chunks of 1,000–10,000 rows at a time with a short sleep between batches to give replicas time to catch up.

Specific Safe Migration Patterns

Adding a Nullable Column

Adding a nullable column with no default is the safest possible schema change. In PostgreSQL (since version 11) and MySQL with Online DDL, this is a metadata-only operation that does not rebuild the table. It completes in milliseconds regardless of table size and acquires the lock only briefly.

sql
-- PostgreSQL: metadata-only operation, near-instant even on 100M+ rows
ALTER TABLE orders ADD COLUMN shipping_notes TEXT;

-- MySQL 8.0: ALGORITHM=INSTANT is available for adding nullable columns
ALTER TABLE orders ADD COLUMN shipping_notes TEXT NULL, ALGORITHM=INSTANT;
Tip

In MySQL 8.0+, ALGORITHM=INSTANT is the fastest possible DDL execution strategy for supported operations. It modifies only the data dictionary, not the actual row format, and acquires a lock only briefly. Supported operations include adding a nullable column at the end of the column list. Use ALGORITHM=INSTANT explicitly to get an error rather than a silent fallback to a slower algorithm if the operation is not supported.

Adding Indexes Without Locking

Standard CREATE INDEX in PostgreSQL acquires a SHARE lock that blocks writes for the entire index build duration. On a large table under write load, this is unacceptable in production.

sql
-- PostgreSQL: CREATE INDEX CONCURRENTLY allows reads AND writes during build.
-- Takes longer than a normal index build (2-3x) but does not block anything.
-- Must be run outside a transaction block.
CREATE INDEX CONCURRENTLY idx_orders_user_id_created_at
    ON orders (user_id, created_at DESC);

-- Verify the index built cleanly (CONCURRENTLY can leave an INVALID index on failure):
SELECT indexname, indisvalid
FROM pg_indexes
JOIN pg_class ON pg_class.relname = pg_indexes.indexname
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
WHERE tablename = 'orders';

-- If indisvalid = false, drop and rebuild:
DROP INDEX CONCURRENTLY idx_orders_user_id_created_at;
sql
-- MySQL 8.0: Adding indexes is online by default for InnoDB.
-- Explicitly specify to prevent silent fallback:
ALTER TABLE orders
    ADD INDEX idx_orders_user_id_created_at (user_id, created_at DESC),
    ALGORITHM=INPLACE, LOCK=NONE;

Adding NOT NULL Safely

Adding a NOT NULL constraint is the operation most likely to cause the kind of 45-minute outage described in the opening. The reason: the database must verify every existing row satisfies the constraint, which requires a full table scan.

In PostgreSQL, the safe approach separates constraint addition from constraint validation:

sql
-- Phase 1: Add the column as nullable with a default value
ALTER TABLE orders ADD COLUMN payment_method TEXT DEFAULT 'card';

-- Phase 2: Backfill any rows where the value is null
UPDATE orders SET payment_method = 'card' WHERE payment_method IS NULL;

-- Phase 3a: Add NOT VALID constraint (fast — skips existing row scan)
-- The constraint is enforced for new inserts/updates immediately,
-- but existing rows are not checked yet.
ALTER TABLE orders
    ADD CONSTRAINT orders_payment_method_not_null
    CHECK (payment_method IS NOT NULL) NOT VALID;

-- Phase 3b: Validate the constraint (uses ShareUpdateExclusiveLock, not AccessExclusive)
-- This scans existing rows but allows concurrent reads and writes.
ALTER TABLE orders
    VALIDATE CONSTRAINT orders_payment_method_not_null;
Important

The NOT VALID / VALIDATE CONSTRAINT split is PostgreSQL-specific. VALIDATE CONSTRAINT acquires a ShareUpdateExclusiveLock, which is compatible with reads and writes but incompatible with other schema changes. It still performs a full table scan, so it can take minutes on a large table, but it does not block your application during that time. MySQL does not have a direct equivalent — use pt-osc or gh-ost for MySQL table rebuilds.

Dangerous Patterns to Avoid

These operations should never be run directly against a large production table during a deploy:

  • Renaming a column — always breaks running code that references the old name. Use the three-phase expand-contract approach.
  • Changing a column type (e.g., VARCHAR(100) to TEXT, or INT to BIGINT) — triggers a full table rewrite in most cases. In PostgreSQL, changing between binary-compatible types can sometimes avoid a rewrite; use USING clause judiciously. For MySQL, use gh-ost.
  • Dropping a column — safe as an operation (fast metadata change in PostgreSQL 16+), but only safe after all application code that references the column has been deployed and is no longer running.
  • Adding a foreign key with validation — acquires a ShareRowExclusiveLock in PostgreSQL while scanning both tables. Use NOT VALID first, then VALIDATE CONSTRAINT in a separate step.

Tools

Schema Versioning: Flyway, Liquibase, Alembic, golang-migrate

Schema versioning tools solve the drift problem: ensuring every environment (development, staging, production, every engineer's local database) applies the same set of migrations in the same order. They work by maintaining a version table in the database and comparing it against the set of migration files on disk.

  • Flyway: SQL-first. Migration files are plain SQL files named with a version prefix (e.g., V3__add_payment_method_column.sql). Minimal configuration, excellent CI/CD integration, and a free community edition that covers most use cases. Best for teams that want SQL migrations without Java dependencies.
  • Liquibase: Supports SQL, XML, YAML, and JSON changelog formats. More powerful than Flyway for complex scenarios (rollbacks, per-database branching, preconditions), but significantly more configuration overhead. The enterprise edition adds drift detection.
  • Alembic: The standard for Python/SQLAlchemy teams. Generates migration files from schema diffs, supports autogenerate, and integrates cleanly with FastAPI, Flask, and Django via SQLAlchemy's ORM layer.
  • golang-migrate: Lightweight Go library and CLI. Supports PostgreSQL, MySQL, SQLite, and others. Migration files are plain SQL. Minimal dependencies, easy to embed in Go services or run as a standalone CLI in CI/CD pipelines.
bash
# golang-migrate: apply pending migrations
migrate -database "postgresql://user:password@localhost:5432/mydb?sslmode=disable" \
        -path ./migrations up

# Check current migration version
migrate -database "postgresql://user:password@localhost:5432/mydb?sslmode=disable" \
        -path ./migrations version

# Flyway: apply pending migrations
flyway -url=jdbc:postgresql://localhost:5432/mydb \
       -user=myuser -password=mypassword migrate

# Flyway: check migration status without applying
flyway -url=jdbc:postgresql://localhost:5432/mydb \
       -user=myuser -password=mypassword info

pt-online-schema-change and gh-ost

For MySQL table rebuilds that cannot be performed online by InnoDB's native DDL engine, pt-online-schema-change (pt-osc) and gh-ost provide application-transparent table rebuilds by copying rows to a shadow table and keeping it synchronized during the copy.

pt-online-schema-change (from Percona Toolkit) uses triggers on the original table to capture DML changes during the copy, applying them to the shadow table. When the copy is complete, it performs a quick atomic table swap.

bash
# pt-osc: add a column to a large MySQL table without locking
pt-online-schema-change \
  --host=localhost \
  --user=dba_user \
  --password=secret \
  --database=myapp \
  --table=orders \
  --alter="ADD COLUMN shipping_provider VARCHAR(100) NULL" \
  --execute \
  --no-drop-old-table \
  --chunk-size=5000 \
  --max-load="Threads_running=50" \
  --critical-load="Threads_running=100"

gh-ost (from GitHub) takes a different approach: instead of triggers, it tails the MySQL binary log (binlog) to capture changes to the original table and replays them against the ghost table. This has lower runtime overhead than trigger-based approaches and is safer on high-write-throughput tables.

bash
# gh-ost: same operation via binlog-based approach
gh-ost \
  --host=localhost \
  --user=dba_user \
  --password=secret \
  --database=myapp \
  --table=orders \
  --alter="ADD COLUMN shipping_provider VARCHAR(100) NULL" \
  --execute \
  --chunk-size=1000 \
  --max-load="Threads_running=50" \
  --critical-load="Threads_running=100,Threads_connected=500" \
  --ok-to-drop-table
Tip

Both pt-osc and gh-ost support a --dry-run / --execute=false mode that validates the migration plan without making any changes. Always run a dry run first against a representative staging environment. gh-ost additionally supports a --postpone-cut-over-flag-file option that pauses the final table swap until you manually remove a sentinel file — giving you manual control over the exact moment of the cutover in production.

Rolling Deploys and Backward Compatibility

In a Kubernetes or container-based deployment, a rolling deploy means multiple versions of the application pod run simultaneously during the transition. Any schema change deployed during a rolling deploy must be simultaneously compatible with both the old and new application version.

The practical rule: never deploy a schema change and an application change in the same deployment if the schema change is not backward-compatible with the old application code. Deploy the schema change first, verify it, then deploy the application code change. The schema should always be a superset of what both versions need.

bash
# Correct deployment order for a non-backward-compatible change:
# 1. Deploy migration (Phase 1: expand — add new column, keep old column)
kubectl apply -f k8s/migration-job.yaml
kubectl wait --for=condition=complete job/schema-migration-v2 --timeout=300s

# 2. Deploy new application version (reads/writes both columns)
kubectl set image deployment/api-server api=myapp:v2.0.0
kubectl rollout status deployment/api-server

# 3. (Later, after v2 is stable) Deploy Phase 3 migration (drop old column)
kubectl apply -f k8s/migration-job-phase3.yaml

Key Takeaways

Key Takeaways
  • The root cause of migration downtime is lock acquisition during DDL. ACCESS EXCLUSIVE (PostgreSQL) and exclusive MDL (MySQL) block all concurrent queries for the duration of the operation on large tables.
  • Use the Expand-Contract pattern for any schema change that is not purely additive. Decompose the change into add, backfill, and drop phases across three separate deployments, with a stable deploy between each phase.
  • Adding nullable columns and adding indexes CONCURRENTLY (PostgreSQL) or with ALGORITHM=INPLACE, LOCK=NONE (MySQL) are safe online operations suitable for direct production execution.
  • Never rename a column, change a column type, or add NOT NULL without a default in a single migration on a large table — each requires multi-phase handling.
  • Use NOT VALID followed by VALIDATE CONSTRAINT in PostgreSQL to add check constraints and foreign keys without acquiring an exclusive lock for the row-scan phase.
  • For MySQL table rebuilds, gh-ost is the safer choice on high-throughput tables; pt-osc is simpler for lower-throughput scenarios where triggers are acceptable.
  • Schema versioning (Flyway, Liquibase, Alembic, golang-migrate) is not optional for production systems — version drift between environments is a leading cause of "works in staging, fails in production" migration incidents.

Working with JusDB on Database Migrations

Zero-downtime migrations require upfront planning that most teams only invest in after their first painful production incident. JusDB helps engineering teams design migration strategies before they encounter problems — reviewing schema change plans, identifying lock risks, designing multi-phase migration sequences, and implementing schema versioning pipelines that fit your existing CI/CD workflow.

Our database consultants have run expand-contract migrations on PostgreSQL and MySQL tables with hundreds of millions of rows, including full schema rewrites for teams moving between data types or normalizing legacy schemas without any maintenance windows. We cover the full migration lifecycle: planning, tooling selection (gh-ost vs pt-osc vs native Online DDL), backfill strategy, cutover coordination, and post-migration validation.

Database Migration Services Talk to a Database Expert

If you have an upcoming migration that involves large tables, type changes, or a tight deployment window, reach out before you run it. A 30-minute migration review call often surfaces risks that would otherwise surface as production incidents.

Share this article