Running Debezium in development feels straightforward — spin up a connector, watch the events flow, and call it done. Production is a different story. Schema migrations break pipelines at the worst possible moment, connector restarts cause duplicate events or missed records, and a single misconfigured offset can corrupt hours of downstream data. These are the problems that separate a demo from a production-grade CDC pipeline. This guide tackles the advanced operational concerns that Debezium's getting-started docs gloss over, giving you the knowledge to build connectors that survive the chaos of real-world databases.
- Debezium stores schema history in a dedicated Kafka topic (
schema.history.internal.kafka.topic) — understand it before you hit a production schema migration. - Schema evolution behavior differs by change type: new columns are safe, dropped columns require care, and type changes can break downstream consumers.
- Offsets live in the
connect-offsetsKafka topic; resetting them means deleting the offset entry with a null-value tombstone message. - Snapshot mode (
initial,schema_only,never,when_needed) determines what Debezium does on restart — choose carefully. - Incremental snapshots (Debezium 1.9+) let you re-snapshot specific tables without stopping the connector.
- Heartbeat events and signal tables are essential for monitoring idle tables and triggering ad-hoc snapshots.
Recap: What Debezium Is Actually Doing
Before diving into the advanced topics, it helps to have a crisp mental model. Debezium is a log-based CDC system — it reads the database's native replication log (the MySQL binary log, PostgreSQL WAL, or SQL Server transaction log) and converts those raw log entries into structured change events published to Kafka topics.
This approach has a critical implication: Debezium is not polling your database. It is tailing a sequential log. That means two things are true simultaneously: (1) you get low-latency, high-fidelity change events without hammering your database with queries, and (2) you are tightly coupled to the log's current state. If you fall too far behind, the log gets rotated and you have a gap. If the schema changes, the log entries have a different shape than what Debezium expects.
All of the advanced challenges covered below stem from this fundamental architecture.
Schema Change Handling
The Schema History Topic
For MySQL connectors, Debezium maintains a full history of every DDL statement it has observed in a dedicated Kafka topic. You configure it with:
{
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"schema.history.internal.kafka.topic": "dbhistory.myapp",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092"
}This topic is append-only and must never be compacted. When Debezium restarts, it replays the entire schema history to reconstruct the exact schema that was in effect at the current binlog position. Without this history, it cannot parse log entries that were written under an older schema version.
Never set a retention policy or enable log compaction on the schema history topic. Losing entries from this topic means Debezium cannot restart — it will fail with a schema reconstruction error and require a full re-snapshot of your database.
New Columns
Adding a new column to a tracked table is the safest schema change. Debezium records the ALTER TABLE DDL in the schema history topic. After the migration, change events for that table will include the new column. Downstream consumers that are unaware of the new field will simply ignore it (in formats like Avro with schema evolution rules enabled, or JSON with flexible parsing).
Dropped Columns
Dropping a column is more dangerous. From Debezium's perspective it will stop emitting that field in future events. Any downstream consumer that requires the field will break. Plan drops carefully — consider a two-phase migration where you first stop consuming the field downstream, deploy, then drop the column.
Use Debezium's column.exclude.list configuration to mask sensitive or irrelevant columns before they ever reach your Kafka topic. This is cleaner than filtering at the consumer level and reduces topic payload size.
Type Changes
Type changes are the most dangerous category. Changing a column from INT to BIGINT may seem compatible but can break strongly-typed consumers or cause Avro schema incompatibilities. Changing VARCHAR(100) to TEXT may be transparent in some serialization formats and breaking in others. The rule of thumb: always test type changes against your full consumer stack in a staging environment before applying them to production tracked tables.
Managing Connector Restarts
Snapshot Modes
Snapshot mode controls what Debezium does when a connector starts (or restarts) and there is no existing offset stored for it. The four primary modes are:
initial(default): On first start, Debezium performs a full consistent snapshot of all configured tables, then transitions to streaming from the log. On restart with a stored offset, it skips the snapshot entirely and resumes from the stored position.schema_only: Snapshots only the schema (DDL), not the table data. Useful when you only care about future changes and do not need historical data in your Kafka topics.never: Skips the snapshot unconditionally. If no offset exists, Debezium starts streaming from the current log position — any historical data is simply absent. Use this when you know the downstream system already has the historical data.when_needed: Performs a snapshot only when the connector cannot resume from its stored offset (e.g., the binlog position no longer exists). This is a safety net mode useful for long-lived connectors where log rotation is a risk.
{
"snapshot.mode": "when_needed"
}Using snapshot.mode: never on a fresh connector with no offset means you will silently miss all existing data. Only use never when you have explicitly accounted for the historical data through another mechanism.
Debezium Server vs Kafka Connect Deployment
Debezium can be deployed in two modes. The first is as a Kafka Connect plugin — this is the most common production deployment and ties Debezium's lifecycle to Kafka Connect's offset management and distributed mode for high availability. The second is Debezium Server, a standalone application that can emit events directly to sinks like Amazon Kinesis, Google Pub/Sub, or Apache Pulsar without requiring a Kafka cluster.
For most data engineering teams, Kafka Connect is the right choice because it provides built-in distributed mode, REST API management, and the mature connector ecosystem. Debezium Server is the right choice when you are operating in an environment without Kafka or need to minimize infrastructure footprint.
Offset Management
Where Offsets Live
In a Kafka Connect deployment, Debezium stores its current position — the binlog filename and position for MySQL, or the LSN for PostgreSQL — in the internal Kafka Connect offsets topic, conventionally named connect-offsets. This topic uses log compaction to retain only the latest offset entry per connector.
You can inspect the current offset for a connector by reading the topic and filtering by the connector's name key:
kafka-console-consumer \
--bootstrap-server kafka:9092 \
--topic connect-offsets \
--from-beginning \
--property print.key=true \
--property key.separator=" => "Resetting Offsets
To reset a connector's offset (forcing it to re-snapshot or start from a different log position), you must write a tombstone message — a record with the connector's key and a null value — to the connect-offsets topic. Log compaction will then remove the offset entry.
# Stop the connector first
curl -X DELETE http://connect:8083/connectors/my-mysql-connector
# Write a null tombstone to clear the offset
kafka-console-producer \
--bootstrap-server kafka:9092 \
--topic connect-offsets \
--property parse.key=true \
--property key.separator="|" \
<Resetting offsets without also clearing downstream topic data will cause duplicate records. Coordinate offset resets with your downstream consumers and, where necessary, replay-safe processing logic (idempotent writes or deduplication).
Incremental Snapshots and Signal Tables
Incremental Snapshots (Debezium 1.9+)
Before Debezium 1.9, re-snapshotting a table required stopping the connector, resetting offsets, and triggering a full snapshot — a disruptive operation. Incremental snapshots change this. They allow Debezium to snapshot specific tables in chunks while the connector continues streaming live changes, interleaving snapshot events with streaming events using a watermarking protocol.
The result is that you can backfill a table or add a new table to CDC coverage without any connector downtime.
Signal Table
Incremental snapshots are triggered via a signal table — a special table in your database that Debezium monitors for commands:
-- Create the signal table
CREATE TABLE debezium_signal (
id VARCHAR(42) PRIMARY KEY,
type VARCHAR(32) NOT NULL,
data TEXT NULL
);{
"signal.data.collection": "mydb.debezium_signal"
}To trigger an incremental snapshot of specific tables, insert a signal row:
INSERT INTO debezium_signal (id, type, data)
VALUES (
'ad-hoc-1',
'execute-snapshot',
'{"data-collections": ["mydb.orders", "mydb.customers"]}'
);Debezium detects this insert via the CDC stream itself and begins the incremental snapshot process for the specified tables.
Error Handling and Monitoring
Heartbeat Intervals
When a tracked table has no activity, the database's replication log still advances (other tables are writing, transactions are committing). Without heartbeat events, Debezium's committed offset falls behind the log position — it has processed no events, so it has committed no new offsets. If the log then rotates, Debezium loses its position.
Heartbeats solve this by periodically writing a small event to a dedicated heartbeat topic, forcing Debezium to commit an updated offset even when the tracked tables are idle:
{
"heartbeat.interval.ms": "10000",
"heartbeat.topics.prefix": "__debezium-heartbeat"
}Set heartbeat.interval.ms to a value well below your database's binary log retention period. For databases with 24-hour log retention, a heartbeat interval of 60 seconds is conservative and safe.
Filtering with Single Message Transforms
Single Message Transforms (SMTs) allow you to filter, rename, or restructure events within the Kafka Connect pipeline before they reach the topic. Common production patterns include routing events from different tables to different topics, extracting only the after state for INSERT/UPDATE events, and dropping events for specific operations:
{
"transforms": "unwrap,route",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "myapp\\.mydb\\.(.*)",
"transforms.route.replacement": "cdc.$1"
}Exactly-Once Semantics Considerations
Debezium provides at-least-once delivery by default — in failure scenarios, some events may be replayed and delivered more than once. Achieving exactly-once semantics requires coordination at multiple levels: Kafka transactions (enable with exactly.once.source.support=enabled in Kafka Connect 3.3+), idempotent producers, and idempotent consumers. Most production teams implement exactly-once at the consumer level via deduplication on the event's source.ts_ms and primary key, which is simpler and more portable than relying on Kafka's transactional machinery.
Include the Debezium event's source.lsn (PostgreSQL) or source.pos (MySQL) in your downstream records. This gives you a durable, ordered identifier for deduplication that is independent of wall-clock time.
Production Hardening Checklist
- Schema history topic retention: Set to
-1(infinite) with compaction disabled. Add monitoring to alert if the topic size grows unexpectedly fast. - Binlog retention alignment: Ensure your database's binary log retention is longer than your maximum expected connector downtime. For MySQL, set
binlog_expire_logs_secondsto at least 72 hours for production connectors. - Dead letter queue: Configure
errors.deadletterqueue.topic.nameto capture events that fail serialization or transformation, rather than halting the entire connector. - Lag monitoring: Track the Kafka consumer group lag for the Connect worker consuming from your database. Sustained lag indicates the connector is not keeping up with write volume.
- Column exclusions for PII: Use
column.mask.with.length.charsorcolumn.exclude.listto prevent sensitive fields from ever entering the Kafka topic. - Topic partitioning strategy: By default, Debezium keys events by primary key — this guarantees ordering per row. If you need ordering across related rows, implement a custom partitioner or use a composite key SMT.
{
"errors.deadletterqueue.topic.name": "cdc-dlq",
"errors.deadletterqueue.topic.replication.factor": "3",
"errors.deadletterqueue.context.headers.enable": "true",
"errors.tolerance": "all",
"column.exclude.list": "mydb.users.password_hash,mydb.users.ssn"
}- The schema history topic is the single most critical piece of Debezium's state for MySQL connectors — protect it with infinite retention and no compaction.
- Schema evolution safety decreases as you move from new columns (safe) to dropped columns (careful) to type changes (test exhaustively).
- Offset resets require writing a null tombstone to the
connect-offsetstopic and must be coordinated with downstream consumers to avoid duplicate data. - Choose your snapshot mode deliberately:
initialfor most cases,when_neededfor long-lived connectors at risk of log rotation,schema_onlywhen historical data is irrelevant. - Incremental snapshots (Debezium 1.9+) with a signal table eliminate the need for disruptive full re-snapshots when adding new tables or backfilling data.
- Set
heartbeat.interval.msto prevent offset staleness on idle tables and protect against log rotation gaps. - Exactly-once delivery is best achieved at the consumer level via idempotent writes using Debezium's source metadata as a deduplication key.
Build Production-Grade CDC Pipelines with JusDB
The operational concerns covered in this guide — schema evolution, offset management, snapshot coordination, heartbeat tuning — represent months of hard-won production experience for most data engineering teams. Getting Debezium right takes time, and the consequences of getting it wrong range from duplicate records to complete pipeline gaps.
JusDB simplifies managed data infrastructure, giving your team battle-tested configurations and operational patterns without the trial-and-error cycle. Whether you are standing up your first Debezium connector or migrating a fragile homegrown CDC system to something more reliable, JusDB provides the tooling and expertise to accelerate that work.
Explore how JusDB can help your team ship more reliable data pipelines — visit jusdb.com to learn more.