Teams take down production at 2 AM for a 4-hour migration window that could have been a 30-second zero-downtime operation. Database migrations are where engineering careers go to have bad weekends. Here are the seven anti-patterns that cause the most incidents, and the patterns that eliminate them.
- Never add a NOT NULL column without a default value to a table with existing rows in a single migration
- Large table ALTER TABLE statements require online DDL tools — pt-online-schema-change or gh-ost for MySQL, pg_reorg for PostgreSQL
- Test rollback scripts in staging before migration night — rollbacks always take 3x longer than expected
- Blue-green deployment with dual-write is the safest pattern for schema-incompatible changes
Anti-Pattern 1: Direct ALTER TABLE on Large Tables
A plain ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2) acquires an exclusive lock and rebuilds the entire table. On a 500M-row table, this blocks all reads and writes for 30–90 minutes.
# MySQL: use gh-ost for zero-downtime ALTER TABLE
# gh-ost creates a ghost table, applies DDL, syncs via binlog, then swaps atomically
gh-ost \
--host=mysql-primary.internal \
--user=ghoster --password=pass \
--database=appdb \
--table=orders \
--alter="ADD COLUMN discount_pct DECIMAL(5,2) DEFAULT NULL" \
--execute \
--chunk-size=1000 \
--max-load="Threads_running=100" \
--critical-load="Threads_running=200" \
--switch-to-rbr# PostgreSQL: use pg_repack for zero-downtime table rebuilds
# Or use CREATE INDEX CONCURRENTLY for index additions
# Add index without blocking reads/writes:
CREATE INDEX CONCURRENTLY idx_orders_discount
ON orders (discount_pct)
WHERE discount_pct IS NOT NULL;
# Add NOT NULL column safely (multi-step):
-- Step 1: Add nullable column with default
ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2) DEFAULT 0;
-- Step 2: Backfill in batches (no lock)
UPDATE orders SET discount_pct = 0 WHERE discount_pct IS NULL AND id BETWEEN 1 AND 100000;
-- Step 3: Add NOT NULL constraint (validates, brief lock in PG 12+)
ALTER TABLE orders ALTER COLUMN discount_pct SET NOT NULL;Anti-Pattern 2: No Rollback Plan
# Always have a tested rollback script before migration night
# Pre-migration: snapshot or dump
mysqldump --single-transaction appdb orders > /backup/orders-pre-migration.sql
# Rollback script (tested in staging):
# ALTER TABLE orders DROP COLUMN discount_pct;
# For PostgreSQL: use transactions for schema changes
BEGIN;
ALTER TABLE orders ADD COLUMN discount_pct DECIMAL(5,2);
-- Test that application works
-- If something goes wrong:
ROLLBACK; -- clean rollback, no data lossAnti-Pattern 3: Migrating Without Feature Flags
Deploying code that uses a new column before the migration runs (or vice versa) causes 500 errors the moment deployment begins. Feature flags decouple schema changes from code changes.
# Bad: code assumes column exists immediately after deployment
def get_order_discount(order_id):
order = db.query("SELECT discount_pct FROM orders WHERE id = %s", order_id)
return order.discount_pct
# Good: feature flag protects new code path
def get_order_discount(order_id):
if feature_flags.is_enabled("orders_discount_column"):
order = db.query("SELECT discount_pct FROM orders WHERE id = %s", order_id)
return order.discount_pct
return 0.0 # fallback to default
# Migration order:
# 1. Deploy migration (add column)
# 2. Enable feature flag
# 3. Deploy code that uses columnAnti-Pattern 4: Synchronous Backfills
# WRONG: backfill in one UPDATE statement (locks table)
# UPDATE orders SET new_col = old_col; -- blocks for hours on large tables
# RIGHT: backfill in batches with sleep between
import time
def backfill_in_batches(db, batch_size=1000, sleep_ms=100):
last_id = 0
while True:
result = db.execute(
"UPDATE orders SET new_col = old_col "
"WHERE id > %s AND new_col IS NULL "
"ORDER BY id LIMIT %s",
(last_id, batch_size)
)
rows_updated = result.rowcount
if rows_updated == 0:
break
last_id = db.execute(
"SELECT MAX(id) FROM orders WHERE id > %s AND new_col IS NULL",
(last_id,)
).scalar()
time.sleep(sleep_ms / 1000)
print(f"Backfilled through id={last_id}")Anti-Pattern 5: Breaking API Contracts in Migrations
Renaming a column in a single deployment breaks every query that uses the old column name simultaneously. The expand-contract pattern handles this safely.
-- WRONG: rename column in one step
-- ALTER TABLE users RENAME COLUMN username TO user_login;
-- Breaks all existing queries that use 'username'
-- RIGHT: expand-contract pattern
-- Phase 1 (Expand): add the new column, dual-write in code
ALTER TABLE users ADD COLUMN user_login VARCHAR(255);
UPDATE users SET user_login = username;
-- Deploy code that writes to BOTH username and user_login
-- Phase 2 (Migrate): after code is deployed and reads shifted
-- Deploy code reading from user_login instead of username
-- Phase 3 (Contract): remove old column after next release
ALTER TABLE users DROP COLUMN username;The expand-contract pattern requires three separate deployments over days or weeks. The most common mistake is collapsing phases 1 and 3 into a single deployment to "save time" — this eliminates the safety net and guarantees a production incident.
- Use gh-ost (MySQL) or CREATE INDEX CONCURRENTLY / pg_repack (PostgreSQL) for zero-downtime schema changes on large tables.
- Always test your rollback script in staging before migration night — rollbacks always take longer than expected under production pressure.
- The expand-contract pattern is the safest approach for column renames and schema-incompatible changes — never collapse the three phases.
- Backfill data in batches with rate limiting, not a single UPDATE statement that locks the table for hours.
Working with JusDB on Database Migrations
JusDB plans and executes database migrations for engineering teams — from simple column additions to full-table rebuilds and zero-downtime schema migrations. We design rollback plans, test migrations in staging environments, and provide on-call support during production migration windows.
Explore JusDB Migration Services → | Talk to a DBA
Related reading: