Three months ago, a fintech startup came to JusDB with a familiar crisis: their PostgreSQL primary was saturating at 95% CPU during market hours, and their engineering team had already provisioned the largest RDS instance available. The culprit was a dashboard that ran 14 analytical queries per page load, all hitting the primary because "that's where the data is." Their first instinct was to add Redis and cache everything. Their second instinct—after a senior engineer pushed back—was to add a read replica and route dashboard traffic to it. Both instincts were partially correct, and partially wrong in ways that would have caused different production incidents down the road.
We ended up deploying both, but the details of which queries went where and what invalidation strategy governed the cache were the difference between a system that worked and one that would have surfaced subtle data consistency bugs under load. The decision is not "read replica or cache." It is: for this specific query, at this freshness tolerance, with this read:write ratio, which layer does the least harm?
This guide gives you the framework to answer that question for your own system, with the concrete tradeoffs, failure modes, and configuration patterns you actually need in production.
- Read replicas serve live, consistent data at the cost of replication lag (typically 10ms–5s). Use them for reporting, analytics, and read-heavy OLTP when you can tolerate seconds of staleness and need full SQL expressiveness.
- Caches (Redis, Valkey, Memcached) serve pre-computed snapshots at sub-millisecond latency. Use them for high-read:write ratios (10:1+), expensive aggregations, and externally-sourced data unlikely to change mid-request.
- Cache invalidation is the hard problem. TTL is simple but causes bursts on expiry; write-through keeps data fresh but requires application coordination; cache-aside is flexible but risks thundering herd.
- Read replicas fail silently under write spikes: replica lag can balloon to minutes during bulk loads, causing reads that appear successful but return stale data.
- Caches fail noisily under load: cache stampede, thundering herd, and cache poisoning are the three failure modes to design against explicitly.
- For most production systems, the answer is both—but with clearly defined routing rules. Use a read replica as your first tier for SQL-dependent reads, and a cache as the second tier for results you can afford to serve stale.
Background: Why One Database Can't Do Everything
A typical relational database primary serves three fundamentally different workloads simultaneously: transactional writes that demand ACID guarantees, transactional reads that need the latest committed state, and analytical reads that aggregate large datasets. These workloads have opposing resource profiles. Transactional writes need fast sequential I/O and low-latency lock acquisition. Analytical reads need high parallel throughput and the freedom to perform full scans without blocking writers.
Running all three on a single PostgreSQL or MySQL primary works at small scale. It stops working the moment analytical queries begin competing with transaction commits for CPU, I/O, and lock manager resources. The textbook solution is to add read capacity. The choice between a read replica and a caching layer depends on what type of read capacity you actually need.
Read replicas extend your database: they serve the same data through the same query engine, with the same SQL semantics, under a replication lag constraint. Caches bypass your database for specific access patterns: they serve pre-computed results with no SQL semantics, under a staleness constraint you define. Neither is universally better. They solve different problems.
How Read Replicas Work
A read replica is a continuously-updated copy of your primary database, maintained through physical or logical replication. In PostgreSQL, streaming replication sends WAL (Write-Ahead Log) records from the primary to standby nodes, which replay them to stay current. In MySQL/MariaDB, binary log replication accomplishes the same thing. The replica runs in a read-only mode: all writes are rejected, but any SELECT query your primary can answer, the replica can also answer—with a lag.
Replication Lag: The Fundamental Constraint
Replication lag is the delay between when a transaction commits on the primary and when it becomes visible on the replica. Under normal conditions on a same-region replica, lag is 10–100ms. Cross-region replicas add network round-trip time, often pushing lag to 50–200ms baseline. During write spikes—large batch loads, bulk updates, schema migrations—lag can balloon to seconds or minutes as the replica's apply process falls behind the WAL stream.
-- Check replication lag on PostgreSQL primary (shows lag for each standby)
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
(sent_lsn - replay_lsn) AS replication_lag_bytes,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
-- Check lag from the replica itself
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_lag;A read replica with 50ms of lag in steady state can accumulate 3–5 minutes of lag during a large batch import or migration. Queries routed to the replica during this window return stale data silently—no error, no warning, just wrong results. Monitor pg_stat_replication continuously and implement lag-aware routing that falls back to the primary when lag exceeds your application's freshness tolerance.
Query Routing with ProxySQL
Manually routing queries in application code is fragile. ProxySQL (for MySQL/MariaDB) and PgBouncer with PgCat or Odyssey (for PostgreSQL) provide transparent read/write splitting at the proxy layer. ProxySQL matches query patterns against routing rules and automatically directs SELECT statements to read replicas while sending writes to the primary.
-- ProxySQL: route all SELECT queries to the replica hostgroup (2)
-- while keeping writes on the primary hostgroup (1)
INSERT INTO mysql_query_rules (
rule_id, active, match_pattern, destination_hostgroup, apply
) VALUES
(1, 1, '^SELECT', 2, 1),
(2, 1, '.*', 1, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
-- ProxySQL: configure the replica hostgroup with lag-aware routing
-- max_replication_lag = 10 seconds; replicas lagging more than this
-- are removed from the read pool automatically
UPDATE mysql_servers
SET max_replication_lag = 10
WHERE hostgroup_id = 2;
LOAD MYSQL SERVERS TO RUNTIME;Primary Use Cases for Read Replicas
- Analytics and reporting dashboards: Queries that aggregate large tables, join across many relations, or run for seconds benefit from a dedicated replica that doesn't compete with OLTP traffic.
- Read-heavy OLTP: Product catalog browsing, user profile reads, and content listings where writes are infrequent relative to reads (3:1 to 10:1 read:write ratios).
- ETL and data export: Long-running
SELECTqueries for data pipeline feeds, CSV exports, or reporting jobs that would otherwise hold locks on the primary. - Search indexes and materialized views: Refreshing Elasticsearch indexes, maintaining materialized views, or computing derived tables from base data without impacting write performance.
How Caching Works
A cache stores a copy of a query result, computed value, or serialized object in a fast-access store (typically RAM) and serves it directly to the application without touching the database. Redis and Valkey are the dominant in-memory stores for database-adjacent caching. Memcached remains viable for simple key-value caching without complex data structures.
The fundamental difference from a read replica: a cache has no awareness of your database schema. It stores whatever you put into it and returns it until it expires or is explicitly invalidated. This makes caches dramatically faster (sub-millisecond vs. 1–50ms for a replica query) but requires you to solve cache invalidation in application code.
Cache Invalidation Strategies
TTL (Time-to-Live)
The simplest strategy: every cache entry expires after a fixed duration. When the TTL elapses, the next request triggers a cache miss, fetches fresh data from the database, and repopulates the cache.
import redis
import json
import psycopg2
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_product_catalog(category_id: int) -> list:
cache_key = f"catalog:{category_id}"
# Try cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: fetch from DB
conn = psycopg2.connect(DATABASE_URL)
with conn.cursor() as cur:
cur.execute(
"SELECT id, name, price, stock FROM products WHERE category_id = %s AND active = true",
(category_id,)
)
rows = cur.fetchall()
result = [{"id": r[0], "name": r[1], "price": float(r[2]), "stock": r[3]} for r in rows]
# Cache for 60 seconds (TTL)
r.setex(cache_key, 60, json.dumps(result))
return resultWrite-Through Caching
On every write to the database, the application also writes the updated value to the cache. Cache and database are always in sync; there is no TTL-driven staleness.
def update_product_price(product_id: int, new_price: float) -> None:
conn = psycopg2.connect(DATABASE_URL)
with conn.cursor() as cur:
# Write to database
cur.execute(
"UPDATE products SET price = %s, updated_at = now() WHERE id = %s",
(new_price, product_id)
)
# Immediately update cache (write-through)
# Fetch the updated row to cache the full object
cur.execute(
"SELECT id, name, price, stock FROM products WHERE id = %s",
(product_id,)
)
row = cur.fetchone()
conn.commit()
# Update cache with fresh data
cache_key = f"product:{product_id}"
r.setex(cache_key, 300, json.dumps({
"id": row[0], "name": row[1], "price": float(row[2]), "stock": row[3]
}))Cache-Aside (Lazy Loading)
The most common pattern in practice. The application checks the cache first; on a miss, it fetches from the database and populates the cache. The cache is never written to directly on updates—entries are either invalidated (deleted) or allowed to expire via TTL.
def get_user_profile(user_id: str) -> dict:
cache_key = f"user_profile:{user_id}"
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: load from DB and populate cache
profile = fetch_user_from_db(user_id)
r.setex(cache_key, 300, json.dumps(profile))
return profile
def update_user_profile(user_id: str, updates: dict) -> None:
write_user_to_db(user_id, updates)
# Invalidate cache entry; next read will repopulate
r.delete(f"user_profile:{user_id}")When Caching Fails: Three Patterns to Know
When a popular cache entry expires, dozens or hundreds of concurrent requests simultaneously detect the miss and all attempt to fetch from the database and repopulate the cache. The result is a sudden spike of N identical queries hitting your database at once—exactly when your cache is supposed to be protecting it. Mitigate with probabilistic early expiry, a distributed lock (SETNX in Redis) around cache population, or staggered TTLs.
import time
import random
def get_with_stampede_protection(cache_key: str, ttl: int, fetch_fn):
"""
Use a Redis lock to prevent cache stampede.
Only one caller populates the cache; others wait and then read.
"""
cached = r.get(cache_key)
if cached:
return json.loads(cached)
lock_key = f"lock:{cache_key}"
lock_acquired = r.set(lock_key, "1", nx=True, ex=5) # 5-second lock TTL
if lock_acquired:
try:
data = fetch_fn()
# Add jitter to TTL to avoid synchronized expiry across keys
jitter = random.randint(0, ttl // 10)
r.setex(cache_key, ttl + jitter, json.dumps(data))
return data
finally:
r.delete(lock_key)
else:
# Another process is populating; wait briefly and retry
time.sleep(0.05)
cached = r.get(cache_key)
return json.loads(cached) if cached else fetch_fn()If your cache population logic fetches data from a replica with significant replication lag, or from an API with partial failure, you can write incorrect data into the cache and serve it as authoritative for the duration of the TTL. This is worse than a cache miss: at least a miss triggers a fresh fetch. Poisoned cache entries serve wrong results silently. Validate data before caching and prefer fetching cache-population data from the primary or a trusted source.
Decision Framework
Use this matrix when deciding where to serve a given read workload. The three axes that matter most are: data freshness requirement, read:write ratio, and query complexity.
| Dimension | Read Replica | Redis/Valkey Cache | CDN / Edge Cache | Materialized View |
|---|---|---|---|---|
| Typical latency | 1–50ms (network + query) | 0.1–2ms (local Redis) | 1–10ms (nearest PoP) | 1–50ms (same as replica) |
| Staleness | Replication lag: 10ms–5s normal; minutes under spike | TTL-defined: 1s–24h (you control) | Cache-Control TTL: minutes to days | Refresh interval: seconds to hours (CONCURRENTLY) |
| Query expressiveness | Full SQL: joins, aggregates, CTEs, window functions | None (key-value / data structure) | None (HTTP response only) | Full SQL at refresh time; read is a simple table scan |
| Write invalidation | Automatic via replication | Manual: TTL, delete on write, or write-through | CDN purge API or Cache-Control | Manual REFRESH MATERIALIZED VIEW |
| Operational complexity | Medium (replica provisioning, lag monitoring, failover) | Medium (Redis cluster, eviction policy, serialization) | Low (managed CDN) | Low–Medium (Postgres-native, but refresh scheduling) |
| Best for | Complex SQL reads, analytics, reporting | High-QPS repeated reads, session data, computed aggregates | Public, mostly-static HTTP responses | Expensive aggregations refreshed on a schedule |
| Cost profile | Full database instance cost (VM + storage) | RAM-proportional; typically cheaper than a full replica | Bandwidth + CDN fees (often very low) | Storage for the materialized table; no extra compute |
Practical Decision Tree
Walk through these questions in order for each read workload you are trying to offload:
- Does the read require freshness within 1 second? If yes, skip both replica and cache; serve from the primary. Distributed systems cannot guarantee sub-second freshness across replicated or cached layers reliably.
- Does the read require full SQL (joins, aggregates, CTEs)? If yes and freshness of 1–30 seconds is acceptable, use a read replica. Caches cannot express SQL queries.
- Is the same result requested more than 10 times per minute? If yes and the result is acceptable stale for seconds to minutes, use a cache. High read:write ratios are where caches pay off most.
- Is the data expensive to compute but rarely changing? If yes, consider a materialized view refreshed on a schedule. This is Postgres-native and requires no extra infrastructure.
- Is the result a public HTTP response? If yes and it changes infrequently (product pages, public blog posts), a CDN is the right first layer before the database is involved at all.
When to Use Both
In production, read replicas and caches are not mutually exclusive—they form a tiered read path. The architecture is: application → cache (Redis) → read replica → primary. Each tier has a different latency, staleness tolerance, and failure mode.
def get_dashboard_metrics(org_id: str) -> dict:
"""
Tiered read: Redis cache -> read replica -> primary (fallback).
Dashboard metrics tolerate up to 5 minutes of staleness.
"""
cache_key = f"dashboard:{org_id}"
# Tier 1: Check Redis cache (sub-millisecond)
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Tier 2: Query read replica (10-50ms, may have replication lag)
try:
metrics = fetch_metrics_from_replica(org_id)
# Cache the result for 5 minutes
r.setex(cache_key, 300, json.dumps(metrics))
return metrics
except ReplicaLagException:
# Tier 3: Fallback to primary if replica lag is unacceptable
metrics = fetch_metrics_from_primary(org_id)
r.setex(cache_key, 30, json.dumps(metrics)) # shorter TTL on primary hit
return metricsNot all reads in your application have the same freshness contract. A real-time transaction confirmation must read from the primary. A monthly summary report can tolerate reading from a replica with 5 seconds of lag. A public leaderboard that changes hourly can be cached for 60 seconds. Define these contracts explicitly in your codebase and route accordingly—mixing them in a single connection pool is the source of subtle consistency bugs.
Specific Patterns Where Both Layers Work Together
- Analytics at scale: Complex aggregation queries (revenue by region, cohort retention) run on a read replica and their results are cached in Redis for the dashboard's cache TTL. The replica absorbs the analytical query cost; Redis absorbs the repeated dashboard requests.
- Search result caching: Full-text search queries run on a replica (eliminating primary load from
tsvectorscans). The top-10 results for popular search terms are cached in Redis with a 60-second TTL, eliminating even the replica round-trip for high-frequency queries. - Session and permission lookups: User permission checks that require a join across
users,roles, andpermissionstables run once per session establishment, are cached in Redis for the session duration, and the underlying data is served from a replica during cache population. The primary never sees a permission check query in steady state.
Key Takeaways
- Read replicas give you the full power of SQL at the cost of replication lag and infrastructure overhead. They are the right first tool for complex, SQL-dependent reads that tolerate seconds of staleness.
- Caches give you sub-millisecond latency at the cost of explicit invalidation logic and staleness you must manage yourself. They pay off when the same result is read many times between writes (read:write ratio above 10:1).
- Replication lag is not constant. Monitor it continuously and build lag-aware routing that degrades gracefully during write spikes rather than silently serving stale data.
- Cache stampede is the most common cache failure mode in high-traffic systems. Use distributed locks, probabilistic early expiry, or background refresh patterns to prevent it.
- The cache-aside pattern with explicit key deletion on write is more operationally reliable than write-through in most web application architectures. Write-through requires every write path to update the cache; missing one creates silent inconsistency.
- For most production systems, the optimal architecture is a tiered read path: Redis cache in front of a read replica, with a primary fallback for freshness-critical paths and during lag events.
Working with JusDB on Database Scaling
Designing a read scaling architecture that actually holds up in production requires more than picking between Redis and a read replica. It requires understanding your specific query mix, freshness contracts, write amplification patterns, and failure mode tolerances—and then configuring each layer to behave correctly under load, not just in the happy path.
JusDB's database consultants have helped engineering teams at SaaS companies, fintech platforms, and high-traffic consumer applications implement tiered read architectures, tune ProxySQL routing rules, design cache invalidation strategies, and instrument replica lag monitoring before it becomes a production incident.
PostgreSQL & Database Scaling Services Talk to a Database Engineer
Whether you are deciding between a read replica and Redis for the first time, or untangling a cache consistency bug in an existing system, we can help you reason through the architecture and implement it correctly.
Related Reading
- Redis vs Valkey: A Complete Guide to the Future of In-Memory Databases — choosing the right in-memory store and understanding the fork history before you commit to one
- ProxySQL MySQL Load Balancing Guide — configuring read/write splitting, replica routing, and lag-aware failover at the proxy layer
- Database Monitoring with Prometheus, Grafana, and PMM — instrumenting replication lag, cache hit rates, and query latency so you can detect problems before users do