Database Architecture

Solving Cache Invalidation and Stampede in Valkey with Debezium CDC

Eliminate stale cache reads and thundering-herd stampedes by streaming MySQL binlog changes to Valkey via Debezium CDC. Full Java setup with embedded engine and XFetch.

JusDB Team
March 7, 2026
12 min read
253 views

A SaaS team running MySQL behind a Valkey caching layer notices something alarming during a flash sale: a single product-price UPDATE invalidates the cache key, 1,200 concurrent requests miss the cache simultaneously, and every one of them hammers MySQL with the same SELECT. The primary's CPU spikes to 97%, p99 latency goes from 4ms to 2.8 seconds, and the checkout queue stalls. The root cause isn't the database or the cache — it's the gap between them. Application-driven cache invalidation is inherently racy: you can't guarantee the cache is updated atomically with the database commit. Debezium's Change Data Capture (CDC) engine closes that gap by streaming row-level changes directly from MySQL's binlog to Valkey, eliminating stale reads and stampede conditions without touching application code.

TL;DR
  • Cache invalidation and cache stampede are two distinct but related problems — invalidation causes stale reads; stampede causes thundering-herd spikes on the database after invalidation.
  • Application-level DELETE-on-write invalidation is racy: the window between DB commit and cache eviction allows stale reads and concurrent recomputation.
  • Debezium's embedded engine captures MySQL binlog events and pushes them to Valkey in near-real-time (~50–200ms latency), removing application code from the invalidation path entirely.
  • Combine CDC-driven invalidation with probabilistic early expiration (XFetch) or mutex locking to eliminate stampede on high-traffic keys.
  • The Debezium 3.x AsyncEmbeddedEngine runs as a library inside your service — no Kafka cluster required for single-instance deployments.
  • This pattern works with MySQL, PostgreSQL, and MongoDB as CDC sources, with Valkey or Redis as the cache target.

What Is Cache Invalidation and Why Is It Hard?

Cache invalidation is the process of removing or updating a cached entry when the underlying source data changes. Phil Karlton's famous quip — "There are only two hard things in Computer Science: cache invalidation and naming things" — endures because the problem is fundamentally about distributed consistency. Your database is the source of truth; the cache is a stale copy by definition. The question is how stale you can tolerate and how much infrastructure you'll build to close the gap.

The standard application-level pattern is cache-aside (lazy-loading): the app checks the cache on read, queries the database on miss, and writes the result back to cache. On write, the app either deletes the cache key (invalidate-on-write) or writes the new value to both the database and cache (write-through). Both approaches have a race condition window:

text
Timeline of a stale-read race:

T1: App instance A commits UPDATE price=19.99 to MySQL
T2: App instance B reads cache → still sees price=24.99 (stale)
T3: App instance A issues DEL product:123 to Valkey
T4: App instance B writes price=24.99 back to cache (!!!)
T5: All subsequent reads see price=24.99 until TTL expires

Between T1 and T3 — typically 1–15ms in well-tuned systems, but potentially seconds under load — every read returns stale data. Worse, as shown at T4, a concurrent reader can re-populate the cache with the old value after the invalidation, creating a permanently stale entry until the TTL kicks in.

What Is a Cache Stampede?

A cache stampede (also called thundering herd) occurs when a popular cache key expires or is invalidated, and many concurrent requests simultaneously attempt to regenerate it by hitting the database. If 500 threads find the key missing at the same instant, all 500 issue the same expensive query. The database gets 500x the expected load for that key, response times spike, connection pools saturate, and the resulting timeouts cascade upstream.

Three factors determine stampede severity:

FactorLow RiskHigh Risk
Request concurrency per key<10 req/s>500 req/s
Query cost to regenerate<5ms>200ms (joins, aggregations)
TTL uniformityJittered TTLsAll keys expire at the same second
Cache stampede sequence after a single invalidation Phase 1: 1,200 requests per second hit Valkey, MySQL is idle. Phase 2: an UPDATE invalidates the cache key. Phase 3: 1,200 concurrent requests all miss simultaneously and hammer MySQL, driving CPU to 97 percent and p99 latency to 2.8 seconds. ① Cache hit — healthy ② Key invalidated ③ Stampede 1,200 req/s Valkey HIT · p99 4ms MySQL idle · CPU 12% Origin protected UPDATE price=19.99 Valkey DEL products:123 MySQL commit OK Race window opens 1,200 concurrent misses Valkey MISS × 1,200 MySQL CPU 97% · p99 2.8s Origin collapses ⚡ The cache doesn't know when the database changed — app-driven DEL leaves a race window that turns one UPDATE into a 1,200× origin spike. Mitigations on the cache layer (XFetch, mutex) reduce impact; CDC removes the trigger by streaming changes directly from the binlog.
Three-phase stampede: healthy cache → single invalidation → 1,200 concurrent misses collapse the origin database.

Traditional mitigations — mutex locking, probabilistic early expiration, request coalescing — all operate at the application or cache layer. They reduce stampede impact but don't solve the root cause: the cache doesn't know when the database changed.

How Debezium CDC Solves Both Problems

Change Data Capture flips the invalidation model from push-from-app to pull-from-binlog. Instead of the application telling the cache "I changed this row," Debezium reads MySQL's binary log (the authoritative change stream) and emits structured events for every INSERT, UPDATE, and DELETE. A lightweight consumer receives these events and writes or evicts the corresponding Valkey keys.

Architecture Overview

Debezium CDC architecture from MySQL binlog to Valkey cache Application writes to MySQL and reads from Valkey. Debezium tails the MySQL binlog and pushes SET or DEL events to Valkey with 50 to 200 milliseconds end-to-end latency, removing the application from the cache invalidation path. Application writes ─ reads UPDATE / INSERT GET (cache-aside) MySQL 8.x Source of truth binlog · ROW · FULL GTID ON · server-id 85744 Valkey 8 Cache layer (BSD-3) products:123 → {…} SETEX with TTL 3600s binlog stream offsets (resume position) SET / DEL · JSON Debezium Engine AsyncEmbeddedEngine 3.x embedded library · no Kafka end-to-end: ~50–200 ms
Debezium tails MySQL's binlog and writes SET/DEL events to Valkey — the application's write path never touches cache invalidation logic.

This architecture provides three guarantees that application-level invalidation cannot:

  • No missed changes: Every committed transaction appears in the binlog — there is no code path that "forgets" to invalidate.
  • Ordering: Binlog events arrive in commit order, so the cache always converges to the latest state.
  • Decoupled latency: The application write path never touches cache invalidation logic — the CDC pipeline handles it asynchronously in ~50–200ms.

Why Valkey Instead of Redis?

Valkey is the Linux Foundation's community-driven fork of Redis, created after Redis Ltd. changed to a dual-license model in March 2024. Valkey 8.x is wire-compatible with Redis 7.2, supports the same data structures and commands, and is fully open-source under the BSD-3-Clause license. For new deployments that require an open-source cache, Valkey is the default choice — and AWS ElastiCache now defaults to Valkey as its engine.

Setting Up Debezium Embedded Engine with MySQL and Valkey

For single-instance deployments, Debezium's embedded engine runs as a library inside your JVM service — no Kafka cluster required. This is ideal for services that own their own cache layer and don't need cross-service event distribution.

Step 1: Add Maven Dependencies

xml

    3.1.0.Final



    
    
        io.debezium
        debezium-api
        ${debezium.version}
    
    
        io.debezium
        debezium-embedded
        ${debezium.version}
    
    
        io.debezium
        debezium-connector-mysql
        ${debezium.version}
    

    
    
        io.valkey
        valkey-java
        6.0.0
    

Step 2: Enable MySQL Binlog for CDC

Debezium requires MySQL's row-based binary logging and a dedicated replication user:

sql
-- my.cnf (or SET GLOBAL for testing)
-- server-id        = 1
-- log_bin          = mysql-bin
-- binlog_format    = ROW
-- binlog_row_image = FULL
-- gtid_mode        = ON
-- enforce_gtid_consistency = ON

-- Verify binlog is active
SHOW VARIABLES LIKE 'log_bin';          -- ON
SHOW VARIABLES LIKE 'binlog_format';    -- ROW

-- Create a dedicated CDC user with replication privileges
CREATE USER 'debezium_cdc'@'%' IDENTIFIED BY 'strong_password_here';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
  ON *.* TO 'debezium_cdc'@'%';
FLUSH PRIVILEGES;
Warning

Never use binlog_row_image = MINIMAL with Debezium — it omits unchanged columns from UPDATE events, breaking cache value reconstruction. Always use FULL.

Step 3: Configure and Start the Debezium Engine

java
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.format.Json;

import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ValkeyCdcInvalidator {

    public static DebeziumEngine> createEngine() {
        Properties props = new Properties();

        // Engine settings
        props.setProperty("name", "valkey-cache-invalidator");
        props.setProperty("connector.class",
            "io.debezium.connector.mysql.MySqlConnector");

        // Offset storage (file-based for single instance)
        props.setProperty("offset.storage",
            "org.apache.kafka.connect.storage.FileOffsetBackingStore");
        props.setProperty("offset.storage.file.filename",
            "/var/lib/debezium/offsets.dat");
        props.setProperty("offset.flush.interval.ms", "10000");

        // Schema history
        props.setProperty("schema.history.internal",
            "io.debezium.storage.file.history.FileSchemaHistory");
        props.setProperty("schema.history.internal.file.filename",
            "/var/lib/debezium/schema-history.dat");

        // MySQL connection
        props.setProperty("database.hostname", "mysql-primary.internal");
        props.setProperty("database.port", "3306");
        props.setProperty("database.user", "debezium_cdc");
        props.setProperty("database.password", "strong_password_here");
        props.setProperty("database.server.id", "85744");
        props.setProperty("topic.prefix", "ecommerce");

        // Capture only the tables you cache
        props.setProperty("table.include.list",
            "ecommerce.products,ecommerce.inventory,ecommerce.pricing");

        // Skip initial snapshot if cache is warm
        props.setProperty("snapshot.mode", "no_data");

        return DebeziumEngine.create(Json.class)
            .using(props)
            .notifying(new ValkeyCacheConsumer())
            .build();
    }

    public static void main(String[] args) {
        ExecutorService executor = Executors.newSingleThreadExecutor();
        executor.execute(createEngine());
    }
}

Step 4: Write the Valkey Cache Consumer

java
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.valkey.JedisPooled;

import java.util.List;

public class ValkeyCacheConsumer implements
        DebeziumEngine.ChangeConsumer> {

    private final JedisPooled valkey = new JedisPooled("valkey.internal", 6379);
    private static final int CACHE_TTL_SECONDS = 3600;

    @Override
    public void handleBatch(
            List> records,
            DebeziumEngine.RecordCommitter> committer)
            throws InterruptedException {

        for (ChangeEvent record : records) {
            if (record.value() == null) {
                // Tombstone event (hard delete) — skip or handle
                committer.markProcessed(record);
                continue;
            }

            JsonObject payload = JsonParser.parseString(record.value())
                .getAsJsonObject().getAsJsonObject("payload");

            String operation = payload.get("op").getAsString();
            String table = payload.getAsJsonObject("source")
                .get("table").getAsString();

            switch (operation) {
                case "c": // INSERT — warm the cache proactively
                case "u": // UPDATE — refresh the cached value
                    JsonObject after = payload.getAsJsonObject("after");
                    String id = after.get("id").getAsString();
                    String cacheKey = table + ":" + id;

                    // Write the full row as JSON to Valkey
                    valkey.setex(cacheKey, CACHE_TTL_SECONDS,
                        after.toString());
                    break;

                case "d": // DELETE — evict from cache
                    JsonObject before = payload.getAsJsonObject("before");
                    String deletedId = before.get("id").getAsString();
                    valkey.del(table + ":" + deletedId);
                    break;
            }

            committer.markProcessed(record);
        }
        committer.markBatchFinished();
    }
}
Tip

Use setex (SET with TTL) instead of plain SET even for CDC-driven writes. The TTL acts as a safety net — if the CDC pipeline stalls, keys auto-expire rather than serving indefinitely stale data.

Preventing Cache Stampede with CDC + XFetch

CDC-driven invalidation eliminates stale reads, but it doesn't fully prevent stampede on ultra-hot keys. When Debezium pushes a new value to Valkey, the key is always populated — but there's still a ~50–200ms window where the old value was evicted and the new one hasn't arrived. For keys receiving 1,000+ reads/second, that window produces a stampede.

The solution is to combine CDC with probabilistic early expiration (XFetch):

XFetch Algorithm

python
# Pseudocode: XFetch with CDC-aware TTL
import random, math, time

def xfetch_get(valkey_client, key, recompute_fn, ttl=3600, beta=1.0):
    """
    Read from cache with probabilistic early recomputation.
    Beta controls eagerness: higher = more aggressive refresh.
    """
    raw = valkey_client.get(key)
    if raw is None:
        # Hard miss — single-caller recompute with mutex
        return recompute_with_lock(valkey_client, key, recompute_fn, ttl)

    value, delta, expiry = decode_xfetch(raw)  # stored as JSON envelope
    now = time.time()

    # Probabilistic early refresh: as we approach expiry,
    # the probability of triggering a background refresh increases
    gap = expiry - now
    threshold = delta * beta * -math.log(random.random())

    if gap < threshold:
        # This request "volunteers" to refresh — others keep using cached value
        refresh_in_background(valkey_client, key, recompute_fn, ttl)

    return value

Combining CDC + XFetch + Mutex

The optimal strategy uses all three layers together:

LayerHandlesMechanism
Debezium CDCData freshnessBinlog → Valkey push (primary invalidation)
XFetchHot-key protectionProbabilistic early refresh before TTL
Mutex lockHard-miss stampedeOnly one caller recomputes on a full miss
Three-layer defense for cache reads A read request flows through three protection layers before reaching MySQL. XFetch absorbs TTL expiry by triggering a background refresh. The mutex lock ensures only one caller recomputes on a hard miss. Debezium CDC continuously keeps Valkey fresh from the MySQL binlog, so most reads never see a miss in the first place. Read request path Background freshness pipeline GET products:123 L1 XFetch — probabilistic refresh cache HIT? return value near TTL? spawn background refresh hard miss L2 Mutex lock — single recompute SET lock:key NX EX 10 losers wait 50ms · winner queries DB winner only MySQL 8.x 1 query, not 1,200 MySQL binlog commit stream · ROW · FULL L3 Debezium CDC — push fresh tail binlog · emit JSON event ~50–200ms after commit SET / DEL Valkey — always fresh most reads never reach L2 or DB feeds L1 cache Defense in depth — each layer absorbs a different failure mode L1 XFetch: absorbs hot-key TTL expiry — at most one caller per key recomputes early. L2 Mutex: contains hard-miss stampede — 1,199 of 1,200 callers wait, 1 hits MySQL. L3 Debezium CDC: removes the trigger — cache is repopulated before reads notice. L3 makes L1 and L2 fire rarely; L1 and L2 catch the rare cases L3 can't (snapshots, restarts).
Three-layer defense: XFetch absorbs TTL expiry, a mutex contains hard-miss stampede, and Debezium CDC keeps Valkey fresh so the first two layers rarely fire.
python
# Mutex-based recompute for hard cache misses
def recompute_with_lock(valkey_client, key, recompute_fn, ttl):
    lock_key = f"lock:{key}"
    acquired = valkey_client.set(lock_key, "1", nx=True, ex=10)

    if acquired:
        try:
            value = recompute_fn()
            store_xfetch(valkey_client, key, value, ttl)
            return value
        finally:
            valkey_client.delete(lock_key)
    else:
        # Another caller is recomputing — wait briefly, then retry
        time.sleep(0.05)
        return xfetch_get(valkey_client, key, recompute_fn, ttl)
Important

Set the mutex lock TTL (the ex=10 parameter) to at least 2x your worst-case recomputation time. If the lock holder crashes without releasing, the lock auto-expires and another caller can take over.

Kafka Connect Deployment for Multi-Service Architectures

The embedded engine works well for a single service, but if multiple services need the same CDC stream — or you need at-least-once delivery guarantees with replay — deploy Debezium as a Kafka Connect source connector.

Connector Configuration

json
{
  "name": "ecommerce-mysql-cdc",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql-primary.internal",
    "database.port": "3306",
    "database.user": "debezium_cdc",
    "database.password": "strong_password_here",
    "database.server.id": "85744",
    "topic.prefix": "ecommerce",
    "table.include.list": "ecommerce.products,ecommerce.inventory",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes.ecommerce",
    "snapshot.mode": "no_data",
    "transforms": "route",
    "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
    "transforms.route.replacement": "cdc.$3"
  }
}

Valkey Sink Consumer (Kafka Consumer Group)

java
// Kafka consumer that writes CDC events to Valkey
@KafkaListener(topics = "cdc.products", groupId = "valkey-cache-writer")
public void onProductChange(ConsumerRecord record) {
    JsonObject event = JsonParser.parseString(record.value())
        .getAsJsonObject().getAsJsonObject("payload");

    String op = event.get("op").getAsString();
    if ("d".equals(op)) {
        String id = event.getAsJsonObject("before").get("id").getAsString();
        valkey.del("products:" + id);
    } else {
        JsonObject after = event.getAsJsonObject("after");
        String id = after.get("id").getAsString();
        valkey.setex("products:" + id, 3600, after.toString());
    }
}

Performance Tuning and Production Considerations

Latency Budget

The end-to-end latency from MySQL commit to Valkey write depends on three components:

StageTypical LatencyTuning Lever
MySQL binlog flush1–5mssync_binlog=1 (don't change)
Debezium poll + transform10–100mspoll.interval.ms, batch size
Valkey write0.1–1msPipeline mode for batches
Total~50–200ms

For most read-heavy workloads, 50–200ms of eventual consistency is acceptable. If you need stronger guarantees, consider write-through caching for the critical-path tables and CDC for everything else.

Offset Storage for Durability

In production, use Kafka-based offset storage or a database-backed store rather than file-based offsets. If the offset file is lost (container restart, disk failure), Debezium will re-snapshot the entire database:

java
// Use Kafka offset storage in production
props.setProperty("offset.storage",
    "org.apache.kafka.connect.storage.KafkaOffsetBackingStore");
props.setProperty("offset.storage.topic", "debezium-offsets");
props.setProperty("offset.storage.partitions", "1");
props.setProperty("offset.storage.replication.factor", "3");

Monitoring the Pipeline

sql
-- Monitor MySQL replication lag for the CDC slot
SHOW SLAVE STATUS\G
-- Key metrics: Seconds_Behind_Master, Relay_Log_Space

-- Valkey-side: check memory and key count
valkey-cli INFO memory
valkey-cli INFO keyspace
valkey-cli DBSIZE
Tip

Export Debezium JMX metrics (debezium.metrics.*) to Prometheus. Key alerts: MilliSecondsBehindSource > 5000 (CDC lag), NumberOfDisconnects > 0 (connection drops), and QueueRemainingCapacity < 100 (back-pressure).

Cache Invalidation Strategy Comparison

StrategyStale ReadsStampede RiskInfra ComplexityApp Code Changes
TTL-onlyUp to TTL durationHighNoneNone
App-level invalidate-on-writeRace window (1–15ms)HighNoneEvery write path
CDC + Debezium (this guide)~50–200msLow (with XFetch)Debezium engineNone
Write-through + CDC hybridNone (for critical keys)NoneDebezium + app logicCritical paths only
Key Takeaways
  • Use Debezium CDC to decouple cache invalidation from application code — the binlog is the single source of truth, not your app's write path.
  • Deploy the embedded AsyncEmbeddedEngine (Debezium 3.x) for single-service setups; use Kafka Connect when multiple consumers need the same change stream.
  • Always set binlog_format=ROW and binlog_row_image=FULL on MySQL — MINIMAL breaks cache value reconstruction.
  • Combine CDC push with XFetch probabilistic early refresh and mutex locking for a three-layer defense against stale reads and stampede.
  • Set a TTL on every CDC-written key as a safety net — if the pipeline stalls, keys expire rather than serving indefinitely stale data.
  • Monitor MilliSecondsBehindSource in Debezium JMX metrics and alert at >5 seconds to catch pipeline lag before it becomes user-visible.

Working with JusDB on Valkey and Debezium CDC

JusDB manages CDC pipelines and caching infrastructure for engineering teams who need production-grade cache consistency without the operational overhead. Our DBAs handle MySQL binlog configuration, Debezium connector deployment, Valkey cluster tuning, and 24/7 pipeline monitoring — so your team ships features instead of debugging stale cache entries at 2 AM.

Explore JusDB Valkey Consulting →  |  Explore JusDB Debezium Services →  |  Talk to a DBA

Related reading:

Share this article

JusDB Team

Official JusDB content team