Database Engineering

Database Migration Anti-Patterns: 5 Mistakes That Cause Production Incidents

Teams take down production for hours with migrations that could be zero-downtime operations. Learn to use gh-ost and CREATE INDEX CONCURRENTLY, implement rollback plans, and apply the expand-contract pattern for breaking changes.

JusDB Team
November 25, 2025
10 min read
158 views

Teams take down production at 2 AM for a 4-hour migration window that could have been a 30-second zero-downtime operation. Database migrations are where engineering careers go to have bad weekends. Here are the seven anti-patterns that cause the most incidents, and the patterns that eliminate them.

TL;DR
  • Never add a NOT NULL column without a default value to a table with existing rows in a single migration
  • Large table ALTER TABLE statements require online DDL tools — pt-online-schema-change or gh-ost for MySQL, pg_reorg for PostgreSQL
  • Test rollback scripts in staging before migration night — rollbacks always take 3x longer than expected
  • Blue-green deployment with dual-write is the safest pattern for schema-incompatible changes

Anti-Pattern 1: Direct ALTER TABLE on Large Tables

A plain ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2) acquires an exclusive lock and rebuilds the entire table. On a 500M-row table, this blocks all reads and writes for 30–90 minutes.

bash
# MySQL: use gh-ost for zero-downtime ALTER TABLE
# gh-ost creates a ghost table, applies DDL, syncs via binlog, then swaps atomically

gh-ost \
  --host=mysql-primary.internal \
  --user=ghoster --password=pass \
  --database=appdb \
  --table=orders \
  --alter="ADD COLUMN discount_pct DECIMAL(5,2) DEFAULT NULL" \
  --execute \
  --chunk-size=1000 \
  --max-load="Threads_running=100" \
  --critical-load="Threads_running=200" \
  --switch-to-rbr
bash
# PostgreSQL: use pg_repack for zero-downtime table rebuilds
# Or use CREATE INDEX CONCURRENTLY for index additions

# Add index without blocking reads/writes:
CREATE INDEX CONCURRENTLY idx_orders_discount
    ON orders (discount_pct)
    WHERE discount_pct IS NOT NULL;

# Add NOT NULL column safely (multi-step):
-- Step 1: Add nullable column with default
ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2) DEFAULT 0;
-- Step 2: Backfill in batches (no lock)
UPDATE orders SET discount_pct = 0 WHERE discount_pct IS NULL AND id BETWEEN 1 AND 100000;
-- Step 3: Add NOT NULL constraint (validates, brief lock in PG 12+)
ALTER TABLE orders ALTER COLUMN discount_pct SET NOT NULL;

Anti-Pattern 2: No Rollback Plan

bash
# Always have a tested rollback script before migration night

# Pre-migration: snapshot or dump
mysqldump --single-transaction appdb orders > /backup/orders-pre-migration.sql

# Rollback script (tested in staging):
# ALTER TABLE orders DROP COLUMN discount_pct;

# For PostgreSQL: use transactions for schema changes
BEGIN;
ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2);
-- Test that application works
-- If something goes wrong:
ROLLBACK;  -- clean rollback, no data loss

Anti-Pattern 3: Migrating Without Feature Flags

Deploying code that uses a new column before the migration runs (or vice versa) causes 500 errors the moment deployment begins. Feature flags decouple schema changes from code changes.

python
# Bad: code assumes column exists immediately after deployment
def get_order_discount(order_id):
    order = db.query("SELECT discount_pct FROM orders WHERE id = %s", order_id)
    return order.discount_pct

# Good: feature flag protects new code path
def get_order_discount(order_id):
    if feature_flags.is_enabled("orders_discount_column"):
        order = db.query("SELECT discount_pct FROM orders WHERE id = %s", order_id)
        return order.discount_pct
    return 0.0  # fallback to default

# Migration order:
# 1. Deploy migration (add column)
# 2. Enable feature flag
# 3. Deploy code that uses column

Anti-Pattern 4: Synchronous Backfills

python
# WRONG: backfill in one UPDATE statement (locks table)
# UPDATE orders SET new_col = old_col;  -- blocks for hours on large tables

# RIGHT: backfill in batches with sleep between
import time

def backfill_in_batches(db, batch_size=1000, sleep_ms=100):
    last_id = 0
    while True:
        result = db.execute(
            "UPDATE orders SET new_col = old_col "
            "WHERE id > %s AND new_col IS NULL "
            "ORDER BY id LIMIT %s",
            (last_id, batch_size)
        )
        rows_updated = result.rowcount
        if rows_updated == 0:
            break
        last_id = db.execute(
            "SELECT MAX(id) FROM orders WHERE id > %s AND new_col IS NULL",
            (last_id,)
        ).scalar()
        time.sleep(sleep_ms / 1000)
        print(f"Backfilled through id={last_id}")

Anti-Pattern 5: Breaking API Contracts in Migrations

Renaming a column in a single deployment breaks every query that uses the old column name simultaneously. The expand-contract pattern handles this safely.

sql
-- WRONG: rename column in one step
-- ALTER TABLE users RENAME COLUMN username TO user_login;
-- Breaks all existing queries that use 'username'

-- RIGHT: expand-contract pattern
-- Phase 1 (Expand): add the new column, dual-write in code
ALTER TABLE users ADD COLUMN user_login VARCHAR(255);
UPDATE users SET user_login = username;
-- Deploy code that writes to BOTH username and user_login

-- Phase 2 (Migrate): after code is deployed and reads shifted
-- Deploy code reading from user_login instead of username

-- Phase 3 (Contract): remove old column after next release
ALTER TABLE users DROP COLUMN username;
Warning

The expand-contract pattern requires three separate deployments over days or weeks. The most common mistake is collapsing phases 1 and 3 into a single deployment to "save time" — this eliminates the safety net and guarantees a production incident.

Key Takeaways
  • Use gh-ost (MySQL) or CREATE INDEX CONCURRENTLY / pg_repack (PostgreSQL) for zero-downtime schema changes on large tables.
  • Always test your rollback script in staging before migration night — rollbacks always take longer than expected under production pressure.
  • The expand-contract pattern is the safest approach for column renames and schema-incompatible changes — never collapse the three phases.
  • Backfill data in batches with rate limiting, not a single UPDATE statement that locks the table for hours.

Working with JusDB on Database Migrations

JusDB plans and executes database migrations for engineering teams — from simple column additions to full-table rebuilds and zero-downtime schema migrations. We design rollback plans, test migrations in staging environments, and provide on-call support during production migration windows.

Explore JusDB Migration Services →  |  Talk to a DBA

Related reading:

Share this article

JusDB Team

Official JusDB content team