PostgreSQL to Kafka CDC

Running PostgreSQL at scale means your application writes are only half the story — the real challenge is getting those changes downstream to analytics pipelines, search indexes, and microservices without burning your primary under the load of polling queries. Logical replication, introduced in PostgreSQL 10 and matured significantly since, gives you a structured change stream directly from the write-ahead log. Paired with Debezium and Kafka, you get a durable, replayable event stream that decouples your producers from every downstream consumer. This guide walks you through a production-grade end-to-end setup, covering the PostgreSQL side, Debezium connector configuration, topic structure, lag monitoring, and schema evolution with Confluent Schema Registry.

TL;DR

Set wal_level = logical on your PostgreSQL instance and create a replication slot using the pgoutput plugin.
Define a CREATE PUBLICATION to control which tables are streamed.
Configure the Debezium PostgreSQL connector with plugin.name: pgoutput, your slot name, and publication name.
Kafka topics are auto-named server.schema.table; each message carries a before, after, and op envelope.
Monitor pg_replication_slots.confirmed_flush_lsn vs pg_current_wal_lsn() to detect lag before your disk fills up.
Inactive replication slots will bloat your WAL indefinitely — always drop slots you no longer need.

Architecture Overview

The logical replication pipeline has four distinct layers that each need to be configured correctly for the whole thing to work reliably.

PostgreSQL writes every row change to the write-ahead log (WAL) first, before applying it to the heap. With wal_level = logical, the WAL entries include enough information — column-level old and new values — for an external decoder to reconstruct the logical meaning of each change. A replication slot tracks how far a downstream consumer has processed the WAL, ensuring PostgreSQL retains WAL segments until they are acknowledged. A publication is a named filter that declares which tables (and which operations) are included in the stream.

Debezium is a Kafka Connect source connector that acts as a PostgreSQL replication client. It connects via the replication protocol, receives decoded change events from the slot, and publishes them as Kafka messages. Kafka then acts as the durable log — consumers can replay from any offset, and you can fan out to as many downstream systems as you need without touching PostgreSQL again.

Schema Registry sits alongside Kafka and stores Avro schemas for each topic. Producers register a schema on first write; consumers fetch and cache it by ID. This gives you schema evolution with backward/forward compatibility guarantees, and prevents a bad producer from breaking all consumers with an incompatible change.

PostgreSQL Setup

Start by verifying and setting the WAL level. This requires a PostgreSQL restart, so plan accordingly on production systems.

sql

-- Check current setting
SHOW wal_level;

-- If not already 'logical', set it in postgresql.conf:
-- wal_level = logical
-- Then restart PostgreSQL.

You also need to raise max_replication_slots and max_wal_senders if you plan to run multiple connectors. A safe starting point for most setups:

ini

# postgresql.conf
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

Next, create a dedicated replication user. Never use a superuser for this in production — scope it precisely:

sql

CREATE ROLE debezium_replication WITH
  REPLICATION
  LOGIN
  PASSWORD 'use-a-strong-secret-here';

-- Grant SELECT on tables you want to replicate
GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium_replication;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO debezium_replication;

Now create the publication. A publication is your authoritative declaration of what gets streamed. You can include all tables or enumerate them explicitly — explicit is safer in production because it prevents accidentally streaming a table you add later:

sql

-- Stream specific tables
CREATE PUBLICATION debezium_pub
  FOR TABLE public.orders, public.customers, public.products
  WITH (publish = 'insert, update, delete');

-- Or stream everything in a schema (use with caution)
-- CREATE PUBLICATION debezium_pub FOR ALL TABLES;

Finally, create the replication slot using the pgoutput plugin. The pgoutput plugin is built into PostgreSQL 10+ and requires no additional installation, unlike the older wal2json plugin:

sql

SELECT pg_create_logical_replication_slot(
  'debezium_slot',   -- slot name, referenced in connector config
  'pgoutput'         -- built-in logical decoder plugin
);

Warning

Creating the replication slot before starting Debezium means the slot will accumulate WAL from the moment of creation. If Debezium is not started promptly, WAL will pile up on disk. Either create the slot and start Debezium immediately, or let Debezium create the slot automatically on first connect (the default behavior when slot.name does not already exist).

Debezium PostgreSQL Connector

Deploy Kafka Connect with the Debezium PostgreSQL connector plugin on the classpath. The connector configuration is submitted as a JSON payload to the Kafka Connect REST API:

bash

curl -X POST http://kafka-connect:8083/connectors \
  -H "Content-Type: application/json" \
  -d @debezium-postgres.json

json

{
  "name": "postgres-debezium-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",

    "database.hostname": "postgres.internal",
    "database.port": "5432",
    "database.user": "debezium_replication",
    "database.password": "${file:/opt/secrets/db.properties:password}",
    "database.dbname": "production",
    "database.server.name": "prod_pg",

    "plugin.name": "pgoutput",
    "slot.name": "debezium_slot",
    "publication.name": "debezium_pub",

    "table.include.list": "public.orders,public.customers,public.products",

    "topic.prefix": "prod_pg",

    "heartbeat.interval.ms": "10000",
    "heartbeat.action.query": "UPDATE debezium_heartbeat SET ts = now() WHERE id = 1",

    "decimal.handling.mode": "double",
    "time.precision.mode": "connect",
    "interval.handling.mode": "string",

    "snapshot.mode": "initial",

    "key.converter": "io.confluent.connect.avro.AvroConverter",
    "key.converter.schema.registry.url": "http://schema-registry:8081",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url": "http://schema-registry:8081"
  }
}

Tip

The heartbeat.interval.ms and heartbeat.action.query settings are critical when you have low-traffic tables in your publication. Without heartbeats, the replication slot's LSN only advances when tracked tables see changes. On a quiet replica or a table that rarely changes, the slot's confirmed LSN can lag far behind, accumulating WAL and consuming disk. The heartbeat query updates a small sentinel table (which must itself be in the publication) to force regular LSN advancement.

Create the heartbeat table and add it to the publication:

sql

CREATE TABLE debezium_heartbeat (
  id   INT PRIMARY KEY,
  ts   TIMESTAMPTZ NOT NULL DEFAULT now()
);
INSERT INTO debezium_heartbeat VALUES (1, now());

-- Add it to the existing publication
ALTER PUBLICATION debezium_pub ADD TABLE debezium_heartbeat;

Topic Structure and Message Envelope

Debezium creates Kafka topics automatically using the pattern {topic.prefix}.{schema}.{table}. With the configuration above, your topics will be:

text

prod_pg.public.orders
prod_pg.public.customers
prod_pg.public.products
prod_pg.public.debezium_heartbeat

Each message follows the Debezium envelope format. The key contains the table's primary key column(s); the value contains the full change event:

json

{
  "schema": { ... },
  "payload": {
    "before": {
      "id": 1042,
      "status": "pending",
      "total": 149.99
    },
    "after": {
      "id": 1042,
      "status": "shipped",
      "total": 149.99
    },
    "source": {
      "version": "2.5.0.Final",
      "connector": "postgresql",
      "name": "prod_pg",
      "ts_ms": 1708540800000,
      "snapshot": "false",
      "db": "production",
      "schema": "public",
      "table": "orders",
      "txId": 7392841,
      "lsn": 24589123456,
      "xmin": null
    },
    "op": "u",
    "ts_ms": 1708540800123
  }
}

The op field identifies the operation type: c for insert (create), u for update, d for delete, and r for read (snapshot). For delete events, after is null; for insert events, before is null. Your consumers must handle all four cases. The source.lsn field gives you the WAL position, which you can use for deduplication or ordering guarantees.

Warning

For before values to be populated on UPDATE and DELETE events, the table must have REPLICA IDENTITY FULL set, or at minimum use the default identity (primary key only). With the default replica identity, before for updates contains only the primary key column(s). If you need full before-images for change data capture downstream, run ALTER TABLE public.orders REPLICA IDENTITY FULL; — but be aware this increases WAL volume for that table.

Monitoring Replication Slot Lag

Replication slot lag is the most operationally dangerous aspect of this setup. If Debezium falls behind — or goes offline entirely — PostgreSQL cannot reclaim WAL segments that the slot has not yet consumed. Your disk will fill up and PostgreSQL will stop accepting writes.

Query slot lag in bytes and LSN distance:

sql

SELECT
  slot_name,
  plugin,
  active,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
  ) AS lag_bytes_pretty,
  pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes,
  confirmed_flush_lsn,
  pg_current_wal_lsn() AS current_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical'
ORDER BY lag_bytes DESC;

Set up alerting at two thresholds: warn when lag exceeds 5 GB and page when it exceeds 20 GB (adjust for your disk capacity and WAL generation rate). Also alert when active = false — an inactive slot is accumulating WAL with no consumer making progress.

Warning

An inactive replication slot with high lag is a ticking clock for your PostgreSQL instance. If you need to stop Debezium for maintenance, either drop the slot first (SELECT pg_drop_replication_slot('debezium_slot');) and recreate it when you restart, or ensure the outage is brief and disk headroom is sufficient. Never leave an inactive slot unmonitored.

Also monitor Kafka consumer lag from the Debezium connector's output side using the Kafka consumer group offset:

bash

kafka-consumer-groups.sh \
  --bootstrap-server kafka:9092 \
  --describe \
  --group your-downstream-consumer-group

PostgreSQL slot lag and Kafka consumer group lag are two independent metrics — both need monitoring. A fast Debezium connector that is keeping up with PostgreSQL does not help if your downstream consumer is 10 million messages behind in Kafka.

Schema Registry and Avro

Using Avro with Confluent Schema Registry gives you schema evolution with enforced compatibility rules. When Debezium writes to a topic for the first time, the Avro converter registers the schema derived from the table's column definitions. Each subsequent message is serialized with a 5-byte header containing a magic byte and the schema ID, followed by the Avro binary payload.

Schema Registry compatibility modes determine what changes are allowed:

bash

# Set BACKWARD compatibility for a topic's value schema
# (new schema can read data written with old schema)
curl -X PUT \
  http://schema-registry:8081/config/prod_pg.public.orders-value \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{"compatibility": "BACKWARD"}'

The compatibility matrix for common column changes:

Adding a nullable column — backward compatible; add with a default of null in the Avro schema.
Adding a NOT NULL column without default — breaking change; requires a full snapshot or explicit schema migration strategy.
Dropping a column — forward compatible only; consumers reading old messages will see the field as its default value.
Changing a column type — almost always breaking; requires a new topic or a schema version bump with a union type.

Tip

When you need to make a breaking schema change, the safest pattern is: create a new publication and slot pointing to the renamed or restructured table, produce to a new Kafka topic, migrate consumers to the new topic, then decommission the old slot. Trying to evolve in place across a breaking change almost always causes consumer failures.

To inspect registered schemas and check for compatibility before a migration:

bash

# List all schema versions for a subject
curl http://schema-registry:8081/subjects/prod_pg.public.orders-value/versions

# Check if a new schema is compatible
curl -X POST \
  http://schema-registry:8081/compatibility/subjects/prod_pg.public.orders-value/versions/latest \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d @new-schema.json

Key Takeaways

wal_level = logical is a hard requirement and needs a PostgreSQL restart — plan this change in advance for production systems.
Use the pgoutput plugin (built-in since PostgreSQL 10) rather than wal2json — it requires no separate installation and is the Debezium default.
Always configure heartbeat queries to prevent slot lag accumulation on low-traffic tables or publications.
Monitor pg_replication_slots.confirmed_flush_lsn vs pg_current_wal_lsn() continuously — unmonitored inactive slots will fill your disk.
Debezium message envelopes carry before, after, and op fields; consumers must handle all four operation types (c, u, d, r).
Kafka topic names follow the pattern {prefix}.{schema}.{table} — standardize your prefix naming convention before going to production.
Use Schema Registry with Avro and set explicit compatibility modes per topic — this is the only way to enforce schema governance across many producer/consumer teams.
Drop replication slots you are not actively consuming; an abandoned slot is one of the fastest ways to bring a PostgreSQL instance to its knees.

Stream Your PostgreSQL Changes with JusDB

Setting up logical replication, managing slot lag, and operating Schema Registry is a significant operational surface area — and that is before you factor in connector upgrades, Kafka cluster management, and schema migration runbooks. JusDB provides managed PostgreSQL with built-in change data capture support, pre-configured replication monitoring, and alerting that catches slot lag before it becomes a disk-full incident.

If you are evaluating this stack for a new project or looking to reduce the operational burden on an existing pipeline, explore JusDB's managed database platform and see how much of this infrastructure you can hand off.

PostgreSQL Logical Replication to Kafka: End-to-End Setup

Architecture Overview

PostgreSQL Setup

Debezium PostgreSQL Connector

Topic Structure and Message Envelope

Monitoring Replication Slot Lag

Schema Registry and Avro

Stream Your PostgreSQL Changes with JusDB

Share this article

Need Expert Help?

PostgreSQL Services

PostgreSQL Migration

PostgreSQL Support

PostgreSQL High Availability

PostgreSQL Cloud Migration

PostgreSQL on Kubernetes

Related Articles

Open Source Databases (2026): PostgreSQL, MySQL, ClickHouse, Cassandra & Beyond

Liquibase vs Flyway: Which Database Migration Tool to Choose?

Apache Airflow for Database Workflows: Scheduling and Orchestration