It is 2:47 AM on a Friday when the on-call alert fires. Your team's marketing campaign launches in five hours, the application needs a new composite index to support the new query pattern, and your primary database is a 2 TB TiDB cluster serving 40,000 requests per minute. In MySQL, you would have two bad choices: block writes for the duration of the index build, or spend the next hour configuring pt-online-schema-change, praying it doesn't deadlock, and babysitting a shadow table migration. In TiDB, you type ALTER TABLE orders ADD INDEX idx_user_status (user_id, status);, press Enter, and go back to sleep. The index builds in the background, online, without locking a single row. When you wake up, the cluster is healthy, the index exists, and the campaign launches on time. This post explains exactly how TiDB makes that possible — and what you need to know to operate it confidently at scale.
- TiDB implements online DDL using a distributed, multi-phase state machine derived from Google's F1 schema change algorithm — no table locks, no shadow tables, no external tools required.
- DDL transitions through five states: absent → delete-only → write-only → write-reorganization → public. Each state is safe for concurrent DML operations across all TiDB nodes.
- From TiDB 6.2+, multiple DDL owner nodes run jobs concurrently; from 7.1+, the index backfill (reorg) phase itself is distributed across TiKV nodes.
- Monitor progress with
ADMIN SHOW DDL JOBS, cancel stuck jobs withADMIN CANCEL DDL JOBS job_id, and tune backfill throughput withtidb_ddl_reorg_worker_cntandtidb_ddl_reorg_batch_size. - Unlike MySQL's pt-osc and gh-ost, TiDB requires no external tooling, no replica lag management, and no binlog-based row-copy — the engine handles everything natively.
Why Standard Online DDL Breaks Down in Distributed Databases
MySQL's built-in online DDL (introduced in 5.6) works well on a single-node instance. The storage engine holds an internal lock, tracks DML changes in an online DDL log, and applies them at the end of the rebuild. The whole thing is coordinated by one process, on one machine, with one consistent view of the data.
TiDB is a distributed SQL database. Data is sharded across dozens or hundreds of TiKV storage nodes as Raft-replicated regions. Multiple TiDB stateless SQL nodes sit in front of the storage layer. There is no single lock manager. There is no single in-memory DDL log. When you add a column or an index, every TiDB node is simultaneously processing DML from your application — each node must agree on what the schema looks like, or you get data corruption.
The naive solution — broadcast a schema change and have every node switch at the same moment — fails in any distributed system because of network partitions and clock skew. The correct solution, published by Google in 2013, is a state machine where adjacent states are always mutually safe.
The F1 Online Schema Change Algorithm
Google's F1 database team published "Online, Asynchronous Schema Changes in F1" to solve exactly this problem. TiDB's DDL subsystem is a direct implementation of that paper. The core insight is: if a schema change is broken into states where any two adjacent states can coexist safely, the cluster can migrate through those states one by one — even if different nodes are briefly in different states simultaneously.
The paper defines the requirement formally: at any point during a schema change, no two nodes may be more than one state apart. TiDB enforces this by making the DDL owner (a single TiDB node elected via etcd) advance the schema version in etcd, and requiring all other TiDB nodes to acknowledge the new version before the owner advances to the next state. This acknowledgment step — called the schema lease — defaults to 45 seconds. A DDL job pauses at each transition until all SQL nodes have caught up.
The schema lease interval is controlled by lease in the TiDB configuration file (or --lease at startup), defaulting to 45 seconds. Lowering it speeds up DDL transitions but increases etcd polling load. For most production clusters, the default is the right value.
The Five DDL States in Detail
Every schema object in TiDB transitions through a fixed sequence of states during creation. For ADD INDEX, the canonical operation, the states are:
1. Absent
The index does not exist in any node's schema cache. No node writes to it or reads from it. This is the starting state before the DDL job begins.
2. Delete-Only
The index exists internally but is invisible to DML. Critically: any row deleted from the base table also has its index entry deleted. This prevents a scenario where a stale index entry points to a row that has since been removed. Read queries cannot use the index yet.
3. Write-Only
The index is now written to by all INSERT, UPDATE, and DELETE operations. However, the optimizer still cannot use it for reads. This ensures the index is kept consistent with ongoing DML before the expensive backfill begins — if the backfill then crashes and restarts, all writes since the write-only transition are already captured in the index.
4. Write-Reorganization (Write-Reorg)
The backfill phase. TiDB scans all existing rows in the base table and populates the index for data that existed before the DDL job started. New DML continues writing to the index normally. The key invariant: the backfill must handle the case where a row is updated or deleted after being scanned but before the backfill write is committed. TiDB resolves this with MVCC — it reads at a fixed snapshot timestamp and then reconciles with any newer versions found in the write-only buffer.
5. Public
The index is fully built and visible to the query optimizer. All DML and read queries use it normally. The DDL job is marked complete.
The transition from write-reorg to public requires that the backfill is complete and that all TiDB nodes have acknowledged the public schema version. If a TiDB node is isolated (network partition) or slow to refresh its schema lease, the DDL job will stall at this boundary. Monitor for jobs stuck in running state for longer than a few lease intervals — this is the most common production DDL incident pattern.
Running DDL Operations
TiDB accepts standard MySQL-compatible DDL syntax. Most common operations are fully online:
-- Add a composite index (online, non-blocking)
ALTER TABLE orders ADD INDEX idx_user_status (user_id, status);
-- Add a column with a default value (online in TiDB 5.x+)
ALTER TABLE orders ADD COLUMN archived BOOLEAN NOT NULL DEFAULT FALSE;
-- Modify a column type (online when safe — e.g., widening VARCHAR)
ALTER TABLE users MODIFY COLUMN email VARCHAR(512) NOT NULL;
-- Drop a column (online)
ALTER TABLE events DROP COLUMN legacy_field;
-- Multiple operations in one statement (each runs as a separate job internally)
ALTER TABLE products
ADD COLUMN weight_kg DECIMAL(8,3),
ADD INDEX idx_category_price (category_id, price),
DROP INDEX idx_old_search;Note that MODIFY COLUMN operations that change data types in ways requiring row-level data conversion (for example, INT to VARCHAR) require a full table rewrite and will take longer. TiDB will still perform these online but the reorg phase is proportional to table size.
Some DDL operations are not online in TiDB: adding a foreign key constraint, and certain MODIFY COLUMN changes that require value validation (for example, adding a NOT NULL constraint to a column that contains nulls). These will block or fail. Check the TiDB documentation for your specific version before running type-changing DDL on large tables in production.
Concurrent DDL Jobs: TiDB 6.2 and Later
Before TiDB 6.2, all DDL jobs ran serially through a single DDL owner node. If a large index build was running, all other DDL statements — even fast operations like adding a comment or dropping an index on a tiny table — queued behind it.
TiDB 6.2 introduced concurrent DDL execution. The DDL framework now distinguishes between jobs that conflict (operate on the same table) and jobs that do not. Non-conflicting jobs run in parallel across multiple DDL worker goroutines. In TiDB 7.x, the framework was further extended to support multiple DDL owner nodes elected simultaneously, distributing ownership across the cluster.
From an operator perspective, this means you no longer need to carefully sequence DDL statements across different tables. Two teams can each run ADD INDEX on different tables at the same time without one blocking the other.
-- These two jobs will run concurrently in TiDB 6.2+ (different tables)
-- Session 1:
ALTER TABLE orders ADD INDEX idx_created (created_at);
-- Session 2 (simultaneously):
ALTER TABLE products ADD INDEX idx_sku (sku);Monitoring DDL Progress
TiDB provides two purpose-built commands for DDL observability:
ADMIN SHOW DDL JOBS
-- Show the last 10 DDL jobs (running and recently completed)
ADMIN SHOW DDL JOBS 10;The output includes: JOB_ID, DB_NAME, TABLE_NAME, JOB_TYPE, SCHEMA_STATE, SCHEMA_ID, TABLE_ID, ROW_COUNT, CREATE_TIME, START_TIME, END_TIME, and STATE. The ROW_COUNT column shows backfill progress — it increments as rows are processed during the write-reorg phase. Divide by the total row count in information_schema.tables for a rough percentage.
-- Estimate percentage complete for a running ADD INDEX job
SELECT
j.job_id,
j.table_name,
j.job_type,
j.schema_state,
j.row_count AS rows_processed,
t.table_rows AS total_rows,
ROUND(j.row_count / NULLIF(t.table_rows, 0) * 100, 1) AS pct_complete
FROM (
SELECT job_id, table_name, job_type, schema_state, row_count
FROM information_schema.ddl_jobs
WHERE state = 'running'
) j
JOIN information_schema.tables t
ON t.table_name = j.table_name
AND t.table_schema = DATABASE();ADMIN SHOW DDL JOB QUERIES
-- Show the original SQL statement for job ID 182
ADMIN SHOW DDL JOB QUERIES 182;This is invaluable when you find a long-running DDL job in the queue and need to confirm which exact statement it corresponds to — especially in environments where multiple teams run migrations concurrently.
Cancelling a Stuck DDL Job
DDL jobs can get stuck for several reasons: a TiDB node was killed mid-reorg, a TiKV region leader is unavailable, or a misconfigured batch size caused the backfill to throttle to near-zero. When this happens, use:
-- Find the job ID
ADMIN SHOW DDL JOBS 20;
-- Cancel job 182 (the job transitions to 'cancelling' then 'cancelled')
ADMIN CANCEL DDL JOBS 182;
-- Verify
ADMIN SHOW DDL JOBS 5;
-- STATE should show 'cancelled'Cancelling a DDL job is safe and fully reversible. The partial index or column is cleaned up automatically. You can re-submit the DDL statement immediately after cancellation. If the job was cancelled due to resource pressure, tune the reorg parameters (see below) before resubmitting.
You can also cancel multiple jobs in a single command:
ADMIN CANCEL DDL JOBS 182, 183, 184;Tuning ADD INDEX Performance
The write-reorg (backfill) phase is the bottleneck for ADD INDEX on large tables. Two session variables control throughput:
tidb_ddl_reorg_worker_cnt
Controls how many goroutines in each TiDB node scan and write index rows during backfill. Default is 4. Increasing this speeds up backfill at the cost of higher CPU and TiKV write pressure.
-- Set for the current session before running the DDL
SET SESSION tidb_ddl_reorg_worker_cnt = 16;
-- Or set globally to affect all future DDL jobs
SET GLOBAL tidb_ddl_reorg_worker_cnt = 16;tidb_ddl_reorg_batch_size
Controls how many rows are processed in each backfill batch. Default is 256. Larger batches reduce overhead from transaction commits and are faster for sequential scans, but increase memory usage and can cause write stalls on TiKV if set too high.
SET SESSION tidb_ddl_reorg_batch_size = 1024;A practical starting point for large tables on well-provisioned clusters: tidb_ddl_reorg_worker_cnt = 16 and tidb_ddl_reorg_batch_size = 1024. Monitor TiKV write latency and CPU during the backfill. If p99 write latency climbs above 50ms, back off worker count first, then batch size. The goal is the fastest backfill that does not visibly degrade application query latency.
Distributed DDL Reorg: TiDB 7.1+
TiDB 7.1 introduced a major architectural change: the index backfill phase itself is distributed across TiKV nodes. In earlier versions, all backfill reads flowed through TiDB nodes — even though the data lived in TiKV. Each TiDB worker would read a batch of rows from TiKV over the network, compute the index key-value pairs, and write them back to TiKV.
In TiDB 7.1+, with the tidb_enable_dist_task variable enabled, TiKV coprocessors participate directly in the backfill. The index build is split into sub-tasks by key range, and each sub-task runs closer to the data on TiKV nodes. This eliminates significant cross-node data transfer and allows the backfill to scale with the number of TiKV nodes rather than TiDB nodes.
-- Enable distributed task framework (TiDB 7.1+)
SET GLOBAL tidb_enable_dist_task = ON;
-- After enabling, ADD INDEX jobs automatically use distributed execution
-- Monitor via the distributed task table
SELECT * FROM mysql.tidb_global_task WHERE task_type = 'backfill'\GIn practice, distributed reorg can reduce large index build time by 50–80% on clusters with many TiKV nodes, because the backfill throughput scales horizontally rather than being bounded by TiDB node count and network bandwidth.
TiDB Online DDL vs. pt-online-schema-change and gh-ost
MySQL DBAs migrating to TiDB often ask whether they still need pt-osc or gh-ost. The short answer is no — but understanding why requires understanding what those tools actually do and what TiDB's native DDL replaces.
| Dimension | pt-osc / gh-ost | TiDB Native DDL |
|---|---|---|
| Mechanism | Shadow table + trigger/binlog row copy | Distributed state machine + MVCC backfill |
| External dependency | Requires running tool, replica access (gh-ost), or triggers (pt-osc) | None — built into the database engine |
| Replica lag risk | High — row copy generates significant replication traffic | None — TiDB replication is Raft-based and DDL-aware |
| Foreign key support | Limited (pt-osc drops and re-adds FKs) | Limited (same constraint) |
| Progress monitoring | Tool-specific output or log parsing | ADMIN SHOW DDL JOBS — native SQL |
| Cancellation | Kill the tool process; cleanup required | ADMIN CANCEL DDL JOBS job_id — atomic, self-cleaning |
| Distributed scale-out | No — single-threaded row copy | Yes — distributed reorg in TiDB 7.1+ |
| DML impact during reorg | Throttled row copy; cutover requires brief lock | No cutover lock; state machine transitions are atomic |
The most important difference for production operations: gh-ost requires a brief write lock at cutover — typically 1–10 seconds — while it swaps the shadow table into place and updates triggers. For high-traffic tables, even a 1-second lock is a customer-facing incident. TiDB has no cutover step: the transition from write-reorg to public is a metadata update propagated via the schema lease mechanism, invisible to running DML.
If you are running a hybrid architecture where TiDB replicates to downstream MySQL consumers via TiCDC or Dumpling, be aware that DDL events are propagated in order. Downstream MySQL may not support all TiDB DDL operations natively. Always test your DDL against the full replication chain in a staging environment before running in production.
Operational Checklist for Production DDL
- Check existing queue length — Run
ADMIN SHOW DDL JOBS 20before submitting a new DDL. If large jobs are already running, assess whether adding to the queue is safe given current cluster load. - Set reorg parameters before the DDL statement — Session variables must be set in the same session before the
ALTER TABLEis issued. - Monitor TiKV write latency during backfill — Use Grafana (TiDB's built-in dashboard ships with TiUP) to watch
TiKV - Commit DurationandTiKV - Storage async write duration. These are your canaries. - Keep a cancellation runbook ready — Know the job ID and have
ADMIN CANCEL DDL JOBSready to paste. Designate who is on-call during large migrations. - Test on a staging cluster with production data volume — DDL timing is highly data-dependent. A 500 GB table with a poorly selective index will take far longer than estimated from row counts alone.
- TiDB's online DDL is built on the F1 algorithm: a five-state machine (absent, delete-only, write-only, write-reorg, public) that guarantees no two cluster nodes differ by more than one schema state at any time — making all transitions safe for concurrent DML.
- The schema lease interval (default 45 seconds) gates each state transition. DDL jobs that appear "stuck" are usually waiting for a slow or partitioned TiDB node to acknowledge the new schema version.
- TiDB 6.2+ supports concurrent DDL jobs across non-conflicting tables. TiDB 7.1+ distributes the backfill reorg phase across TiKV nodes, enabling linear throughput scaling for large index builds.
- Use
ADMIN SHOW DDL JOBSfor progress monitoring,ADMIN SHOW DDL JOB QUERIESto identify what a job is doing, andADMIN CANCEL DDL JOBS job_idfor safe, self-cleaning cancellation. - Tune backfill performance with
tidb_ddl_reorg_worker_cnt(parallelism) andtidb_ddl_reorg_batch_size(rows per commit). Watch TiKV write latency as your throttle signal. - Unlike pt-osc and gh-ost, TiDB requires no external tooling, no shadow tables, no triggers, no binlog access, and no cutover lock. The engine handles schema changes as first-class distributed operations.
Running TiDB at scale — tuning DDL jobs, diagnosing reorg stalls, sizing TiKV regions, and managing distributed schema migrations across multi-region deployments — is the kind of work JusDB specializes in. Our team of distributed database engineers has operated TiDB clusters from dozens of gigabytes to multi-petabyte deployments. If your organization is planning a large schema migration, migrating from MySQL, or building greenfield infrastructure on TiDB, reach out to JusDB for an architecture review and hands-on operational support.