A SaaS team running MySQL behind a Valkey caching layer notices something alarming during a flash sale: a single product-price UPDATE invalidates the cache key, 1,200 concurrent requests miss the cache simultaneously, and every one of them hammers MySQL with the same SELECT. The primary's CPU spikes to 97%, p99 latency goes from 4ms to 2.8 seconds, and the checkout queue stalls. The root cause isn't the database or the cache — it's the gap between them. Application-driven cache invalidation is inherently racy: you can't guarantee the cache is updated atomically with the database commit. Debezium's Change Data Capture (CDC) engine closes that gap by streaming row-level changes directly from MySQL's binlog to Valkey, eliminating stale reads and stampede conditions without touching application code.
- Cache invalidation and cache stampede are two distinct but related problems — invalidation causes stale reads; stampede causes thundering-herd spikes on the database after invalidation.
- Application-level
DELETE-on-write invalidation is racy: the window between DB commit and cache eviction allows stale reads and concurrent recomputation. - Debezium's embedded engine captures MySQL binlog events and pushes them to Valkey in near-real-time (~50–200ms latency), removing application code from the invalidation path entirely.
- Combine CDC-driven invalidation with probabilistic early expiration (XFetch) or mutex locking to eliminate stampede on high-traffic keys.
- The Debezium 3.x
AsyncEmbeddedEngineruns as a library inside your service — no Kafka cluster required for single-instance deployments. - This pattern works with MySQL, PostgreSQL, and MongoDB as CDC sources, with Valkey or Redis as the cache target.
What Is Cache Invalidation and Why Is It Hard?
Cache invalidation is the process of removing or updating a cached entry when the underlying source data changes. Phil Karlton's famous quip — "There are only two hard things in Computer Science: cache invalidation and naming things" — endures because the problem is fundamentally about distributed consistency. Your database is the source of truth; the cache is a stale copy by definition. The question is how stale you can tolerate and how much infrastructure you'll build to close the gap.
The standard application-level pattern is cache-aside (lazy-loading): the app checks the cache on read, queries the database on miss, and writes the result back to cache. On write, the app either deletes the cache key (invalidate-on-write) or writes the new value to both the database and cache (write-through). Both approaches have a race condition window:
Timeline of a stale-read race:
T1: App instance A commits UPDATE price=19.99 to MySQL
T2: App instance B reads cache → still sees price=24.99 (stale)
T3: App instance A issues DEL product:123 to Valkey
T4: App instance B writes price=24.99 back to cache (!!!)
T5: All subsequent reads see price=24.99 until TTL expiresBetween T1 and T3 — typically 1–15ms in well-tuned systems, but potentially seconds under load — every read returns stale data. Worse, as shown at T4, a concurrent reader can re-populate the cache with the old value after the invalidation, creating a permanently stale entry until the TTL kicks in.
What Is a Cache Stampede?
A cache stampede (also called thundering herd) occurs when a popular cache key expires or is invalidated, and many concurrent requests simultaneously attempt to regenerate it by hitting the database. If 500 threads find the key missing at the same instant, all 500 issue the same expensive query. The database gets 500x the expected load for that key, response times spike, connection pools saturate, and the resulting timeouts cascade upstream.
Three factors determine stampede severity:
| Factor | Low Risk | High Risk |
|---|---|---|
| Request concurrency per key | <10 req/s | >500 req/s |
| Query cost to regenerate | <5ms | >200ms (joins, aggregations) |
| TTL uniformity | Jittered TTLs | All keys expire at the same second |
Traditional mitigations — mutex locking, probabilistic early expiration, request coalescing — all operate at the application or cache layer. They reduce stampede impact but don't solve the root cause: the cache doesn't know when the database changed.
How Debezium CDC Solves Both Problems
Change Data Capture flips the invalidation model from push-from-app to pull-from-binlog. Instead of the application telling the cache "I changed this row," Debezium reads MySQL's binary log (the authoritative change stream) and emits structured events for every INSERT, UPDATE, and DELETE. A lightweight consumer receives these events and writes or evicts the corresponding Valkey keys.
Architecture Overview
This architecture provides three guarantees that application-level invalidation cannot:
- No missed changes: Every committed transaction appears in the binlog — there is no code path that "forgets" to invalidate.
- Ordering: Binlog events arrive in commit order, so the cache always converges to the latest state.
- Decoupled latency: The application write path never touches cache invalidation logic — the CDC pipeline handles it asynchronously in ~50–200ms.
Why Valkey Instead of Redis?
Valkey is the Linux Foundation's community-driven fork of Redis, created after Redis Ltd. changed to a dual-license model in March 2024. Valkey 8.x is wire-compatible with Redis 7.2, supports the same data structures and commands, and is fully open-source under the BSD-3-Clause license. For new deployments that require an open-source cache, Valkey is the default choice — and AWS ElastiCache now defaults to Valkey as its engine.
Setting Up Debezium Embedded Engine with MySQL and Valkey
For single-instance deployments, Debezium's embedded engine runs as a library inside your JVM service — no Kafka cluster required. This is ideal for services that own their own cache layer and don't need cross-service event distribution.
Step 1: Add Maven Dependencies
3.1.0.Final
io.debezium
debezium-api
${debezium.version}
io.debezium
debezium-embedded
${debezium.version}
io.debezium
debezium-connector-mysql
${debezium.version}
io.valkey
valkey-java
6.0.0
Step 2: Enable MySQL Binlog for CDC
Debezium requires MySQL's row-based binary logging and a dedicated replication user:
-- my.cnf (or SET GLOBAL for testing)
-- server-id = 1
-- log_bin = mysql-bin
-- binlog_format = ROW
-- binlog_row_image = FULL
-- gtid_mode = ON
-- enforce_gtid_consistency = ON
-- Verify binlog is active
SHOW VARIABLES LIKE 'log_bin'; -- ON
SHOW VARIABLES LIKE 'binlog_format'; -- ROW
-- Create a dedicated CDC user with replication privileges
CREATE USER 'debezium_cdc'@'%' IDENTIFIED BY 'strong_password_here';
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT
ON *.* TO 'debezium_cdc'@'%';
FLUSH PRIVILEGES;Never use binlog_row_image = MINIMAL with Debezium — it omits unchanged columns from UPDATE events, breaking cache value reconstruction. Always use FULL.
Step 3: Configure and Start the Debezium Engine
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.debezium.engine.format.Json;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ValkeyCdcInvalidator {
public static DebeziumEngine> createEngine() {
Properties props = new Properties();
// Engine settings
props.setProperty("name", "valkey-cache-invalidator");
props.setProperty("connector.class",
"io.debezium.connector.mysql.MySqlConnector");
// Offset storage (file-based for single instance)
props.setProperty("offset.storage",
"org.apache.kafka.connect.storage.FileOffsetBackingStore");
props.setProperty("offset.storage.file.filename",
"/var/lib/debezium/offsets.dat");
props.setProperty("offset.flush.interval.ms", "10000");
// Schema history
props.setProperty("schema.history.internal",
"io.debezium.storage.file.history.FileSchemaHistory");
props.setProperty("schema.history.internal.file.filename",
"/var/lib/debezium/schema-history.dat");
// MySQL connection
props.setProperty("database.hostname", "mysql-primary.internal");
props.setProperty("database.port", "3306");
props.setProperty("database.user", "debezium_cdc");
props.setProperty("database.password", "strong_password_here");
props.setProperty("database.server.id", "85744");
props.setProperty("topic.prefix", "ecommerce");
// Capture only the tables you cache
props.setProperty("table.include.list",
"ecommerce.products,ecommerce.inventory,ecommerce.pricing");
// Skip initial snapshot if cache is warm
props.setProperty("snapshot.mode", "no_data");
return DebeziumEngine.create(Json.class)
.using(props)
.notifying(new ValkeyCacheConsumer())
.build();
}
public static void main(String[] args) {
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(createEngine());
}
} Step 4: Write the Valkey Cache Consumer
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import io.debezium.engine.ChangeEvent;
import io.debezium.engine.DebeziumEngine;
import io.valkey.JedisPooled;
import java.util.List;
public class ValkeyCacheConsumer implements
DebeziumEngine.ChangeConsumer> {
private final JedisPooled valkey = new JedisPooled("valkey.internal", 6379);
private static final int CACHE_TTL_SECONDS = 3600;
@Override
public void handleBatch(
List> records,
DebeziumEngine.RecordCommitter> committer)
throws InterruptedException {
for (ChangeEvent record : records) {
if (record.value() == null) {
// Tombstone event (hard delete) — skip or handle
committer.markProcessed(record);
continue;
}
JsonObject payload = JsonParser.parseString(record.value())
.getAsJsonObject().getAsJsonObject("payload");
String operation = payload.get("op").getAsString();
String table = payload.getAsJsonObject("source")
.get("table").getAsString();
switch (operation) {
case "c": // INSERT — warm the cache proactively
case "u": // UPDATE — refresh the cached value
JsonObject after = payload.getAsJsonObject("after");
String id = after.get("id").getAsString();
String cacheKey = table + ":" + id;
// Write the full row as JSON to Valkey
valkey.setex(cacheKey, CACHE_TTL_SECONDS,
after.toString());
break;
case "d": // DELETE — evict from cache
JsonObject before = payload.getAsJsonObject("before");
String deletedId = before.get("id").getAsString();
valkey.del(table + ":" + deletedId);
break;
}
committer.markProcessed(record);
}
committer.markBatchFinished();
}
} Use setex (SET with TTL) instead of plain SET even for CDC-driven writes. The TTL acts as a safety net — if the CDC pipeline stalls, keys auto-expire rather than serving indefinitely stale data.
Preventing Cache Stampede with CDC + XFetch
CDC-driven invalidation eliminates stale reads, but it doesn't fully prevent stampede on ultra-hot keys. When Debezium pushes a new value to Valkey, the key is always populated — but there's still a ~50–200ms window where the old value was evicted and the new one hasn't arrived. For keys receiving 1,000+ reads/second, that window produces a stampede.
The solution is to combine CDC with probabilistic early expiration (XFetch):
XFetch Algorithm
# Pseudocode: XFetch with CDC-aware TTL
import random, math, time
def xfetch_get(valkey_client, key, recompute_fn, ttl=3600, beta=1.0):
"""
Read from cache with probabilistic early recomputation.
Beta controls eagerness: higher = more aggressive refresh.
"""
raw = valkey_client.get(key)
if raw is None:
# Hard miss — single-caller recompute with mutex
return recompute_with_lock(valkey_client, key, recompute_fn, ttl)
value, delta, expiry = decode_xfetch(raw) # stored as JSON envelope
now = time.time()
# Probabilistic early refresh: as we approach expiry,
# the probability of triggering a background refresh increases
gap = expiry - now
threshold = delta * beta * -math.log(random.random())
if gap < threshold:
# This request "volunteers" to refresh — others keep using cached value
refresh_in_background(valkey_client, key, recompute_fn, ttl)
return valueCombining CDC + XFetch + Mutex
The optimal strategy uses all three layers together:
| Layer | Handles | Mechanism |
|---|---|---|
| Debezium CDC | Data freshness | Binlog → Valkey push (primary invalidation) |
| XFetch | Hot-key protection | Probabilistic early refresh before TTL |
| Mutex lock | Hard-miss stampede | Only one caller recomputes on a full miss |
# Mutex-based recompute for hard cache misses
def recompute_with_lock(valkey_client, key, recompute_fn, ttl):
lock_key = f"lock:{key}"
acquired = valkey_client.set(lock_key, "1", nx=True, ex=10)
if acquired:
try:
value = recompute_fn()
store_xfetch(valkey_client, key, value, ttl)
return value
finally:
valkey_client.delete(lock_key)
else:
# Another caller is recomputing — wait briefly, then retry
time.sleep(0.05)
return xfetch_get(valkey_client, key, recompute_fn, ttl)Set the mutex lock TTL (the ex=10 parameter) to at least 2x your worst-case recomputation time. If the lock holder crashes without releasing, the lock auto-expires and another caller can take over.
Kafka Connect Deployment for Multi-Service Architectures
The embedded engine works well for a single service, but if multiple services need the same CDC stream — or you need at-least-once delivery guarantees with replay — deploy Debezium as a Kafka Connect source connector.
Connector Configuration
{
"name": "ecommerce-mysql-cdc",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql-primary.internal",
"database.port": "3306",
"database.user": "debezium_cdc",
"database.password": "strong_password_here",
"database.server.id": "85744",
"topic.prefix": "ecommerce",
"table.include.list": "ecommerce.products,ecommerce.inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.ecommerce",
"snapshot.mode": "no_data",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
"transforms.route.replacement": "cdc.$3"
}
}Valkey Sink Consumer (Kafka Consumer Group)
// Kafka consumer that writes CDC events to Valkey
@KafkaListener(topics = "cdc.products", groupId = "valkey-cache-writer")
public void onProductChange(ConsumerRecord record) {
JsonObject event = JsonParser.parseString(record.value())
.getAsJsonObject().getAsJsonObject("payload");
String op = event.get("op").getAsString();
if ("d".equals(op)) {
String id = event.getAsJsonObject("before").get("id").getAsString();
valkey.del("products:" + id);
} else {
JsonObject after = event.getAsJsonObject("after");
String id = after.get("id").getAsString();
valkey.setex("products:" + id, 3600, after.toString());
}
} Performance Tuning and Production Considerations
Latency Budget
The end-to-end latency from MySQL commit to Valkey write depends on three components:
| Stage | Typical Latency | Tuning Lever |
|---|---|---|
| MySQL binlog flush | 1–5ms | sync_binlog=1 (don't change) |
| Debezium poll + transform | 10–100ms | poll.interval.ms, batch size |
| Valkey write | 0.1–1ms | Pipeline mode for batches |
| Total | ~50–200ms |
For most read-heavy workloads, 50–200ms of eventual consistency is acceptable. If you need stronger guarantees, consider write-through caching for the critical-path tables and CDC for everything else.
Offset Storage for Durability
In production, use Kafka-based offset storage or a database-backed store rather than file-based offsets. If the offset file is lost (container restart, disk failure), Debezium will re-snapshot the entire database:
// Use Kafka offset storage in production
props.setProperty("offset.storage",
"org.apache.kafka.connect.storage.KafkaOffsetBackingStore");
props.setProperty("offset.storage.topic", "debezium-offsets");
props.setProperty("offset.storage.partitions", "1");
props.setProperty("offset.storage.replication.factor", "3");Monitoring the Pipeline
-- Monitor MySQL replication lag for the CDC slot
SHOW SLAVE STATUS\G
-- Key metrics: Seconds_Behind_Master, Relay_Log_Space
-- Valkey-side: check memory and key count
valkey-cli INFO memory
valkey-cli INFO keyspace
valkey-cli DBSIZEExport Debezium JMX metrics (debezium.metrics.*) to Prometheus. Key alerts: MilliSecondsBehindSource > 5000 (CDC lag), NumberOfDisconnects > 0 (connection drops), and QueueRemainingCapacity < 100 (back-pressure).
Cache Invalidation Strategy Comparison
| Strategy | Stale Reads | Stampede Risk | Infra Complexity | App Code Changes |
|---|---|---|---|---|
| TTL-only | Up to TTL duration | High | None | None |
| App-level invalidate-on-write | Race window (1–15ms) | High | None | Every write path |
| CDC + Debezium (this guide) | ~50–200ms | Low (with XFetch) | Debezium engine | None |
| Write-through + CDC hybrid | None (for critical keys) | None | Debezium + app logic | Critical paths only |
- Use Debezium CDC to decouple cache invalidation from application code — the binlog is the single source of truth, not your app's write path.
- Deploy the embedded
AsyncEmbeddedEngine(Debezium 3.x) for single-service setups; use Kafka Connect when multiple consumers need the same change stream. - Always set
binlog_format=ROWandbinlog_row_image=FULLon MySQL —MINIMALbreaks cache value reconstruction. - Combine CDC push with XFetch probabilistic early refresh and mutex locking for a three-layer defense against stale reads and stampede.
- Set a TTL on every CDC-written key as a safety net — if the pipeline stalls, keys expire rather than serving indefinitely stale data.
- Monitor
MilliSecondsBehindSourcein Debezium JMX metrics and alert at >5 seconds to catch pipeline lag before it becomes user-visible.
Working with JusDB on Valkey and Debezium CDC
JusDB manages CDC pipelines and caching infrastructure for engineering teams who need production-grade cache consistency without the operational overhead. Our DBAs handle MySQL binlog configuration, Debezium connector deployment, Valkey cluster tuning, and 24/7 pipeline monitoring — so your team ships features instead of debugging stale cache entries at 2 AM.
Explore JusDB Valkey Consulting → | Explore JusDB Debezium Services → | Talk to a DBA
Related reading: