Which Valkey eviction policy should I pick?

It depends on access pattern, not on what's 'modern'. Use allkeys-lru when traffic has clear skew (top 20% of keys absorb 80% of reads) and you want the cache to keep what's hot. Use volatile-ttl when keys have meaningful TTLs and you want oldest-expiring evicted first. Use noeviction when Valkey holds source-of-truth data and you'd rather reject writes than lose data. allkeys-random is rarely the right answer outside benchmarks.

AOF or RDB - what's the latency impact?

RDB snapshots fork the Valkey process; on a server with 32GB+ of dataset, that fork() pauses request handling for tens of milliseconds while the kernel duplicates page tables. AOF with appendfsync=everysec adds 1-2ms p99 latency continuously but avoids the snapshot spike. For latency-sensitive workloads (sub-ms p99), prefer AOF everysec + RDB backups only at off-peak. For batch workloads, RDB-only is cheaper.

How do I find hot keys causing latency spikes?

Three layers. (1) valkey-cli --hotkeys samples key access frequency via OBJECT FREQ (requires an LFU maxmemory-policy); MONITOR streams all commands and must only be run in short off-peak bursts. (2) For longer-term analysis, enable latency monitor and slow-log with appropriate thresholds, then query them via LATENCY HISTORY and SLOWLOG GET. (3) For cluster mode, use CLUSTER COUNTKEYSINSLOT to find unbalanced shards. Hot-key remediation is usually pipeline batching, client-side caching, or read-replica fan-out.

What memory fragmentation ratio is acceptable?

Valkey reports mem_fragmentation_ratio in INFO memory. Healthy: 1.0 - 1.5. Above 1.5 means meaningful RSS overhead vs. allocated heap; above 2.0 you're wasting memory. Causes: heavy turnover (lots of expire+insert cycles), large key/value size variance, or jemalloc tuning that doesn't match workload. Remediation: enable active defragmentation (activedefrag yes), tune jemalloc with MALLOC_CONF, or rolling-restart replicas to reclaim RSS.

What latency targets are realistic for Valkey?

Single-key GET/SET on warm cache: p50 <0.5ms, p99 <2ms on same-AZ network. Cross-AZ replication-following reads: p50 <1ms, p99 <5ms. Lua script with 1-5 ops: p99 <3ms. Anything above these on otherwise-idle hardware is a sign of GC pressure, network jitter, or memory fragmentation - not Valkey itself.

When do I need to scale Valkey horizontally vs vertically?

Scale up (bigger node) while working set fits comfortably in RAM and write QPS doesn't saturate one CPU core (Valkey is single-threaded for command execution). Scale out (Cluster mode) when working set exceeds single-node memory, or when sustained writes saturate the single primary. The break-even is usually around 64-128 GB working set or 100k+ sustained writes/sec.

▸ Memory fragmentation drift - mem_fragmentation_ratio climbing past 1.5; activedefrag tuning isn't reclaiming memory and you're force-restarting nodes during business hours.
▸ KEYS / SCAN blocking writes - Operational scripts using KEYS at scale; single-threaded event loop blocked for seconds, all reads / writes stall, and migrating those scripts to SCAN keeps breaking edge cases.
▸ Eviction policy thrashing - allkeys-lru evicting the wrong keys because cardinality doesn't match access pattern; cache hit ratio collapsing from 95% to 70% under peak load and you can't identify the culprit keys.

JusDB performance consultants resolve all three in days, with a written tuning playbook. Book a tuning scoping call →

Tactical engineering - not advisory

Valkey Performance Tuning

In short: Valkey performance tuning involves sizing maxmemory and selecting the right eviction policy per workload, choosing between RDB and AOF persistence for latency, detecting and remediating hot keys, batching with pipelines, scaling reads across replicas, and fixing memory fragmentation - to reach sub-millisecond p99 reads at scale.

Memory policies, eviction selection, persistence latency, pipeline batching, and hot-key remediation - the parameters that, in a representative engagement, moved p99 latency from 12ms to under 2ms and cut peak memory by 30-40%.

Where Valkey latency hides

Six tuning surfaces, each with measurable before/after. We instrument first, then change, then validate - never the other order.

Memory & Eviction

maxmemory sizing, eviction policy selection per workload, fragmentation ratio remediation, active defragmentation tuning.

Persistence Latency

RDB vs AOF trade-offs, fsync policy tuning (always / everysec / no), snapshot fork() impact analysis on large datasets.

Latency Profiling

LATENCY MONITOR threshold tuning, slow-log analysis, network-vs-server latency decomposition, p99 budget tracking.

Pipeline & Batching

Client-side pipelining for throughput, MGET/MSET for fan-out reads, server-side Lua to collapse round-trips on hot keys.

Replica Read Scaling

Replica routing strategies, read-from-replica trade-offs (eventual consistency window), connection-pool sizing per replica.

Hot-Key & Skew Analysis

MONITOR-based hot-key sampling, slot distribution audit (cluster mode), client-side caching to absorb hot reads.

A typical Valkey tuning engagement

Week 1

Instrumentation baseline

Deploy latency monitor, slow-log thresholds, INFO sampling at 60s intervals, hot-key MONITOR sampling. Establish current p50/p95/p99 by command type and cluster shard.

Week 2

Memory & eviction pass

Working-set sizing, eviction-policy A/B against historical traffic replay, fragmentation ratio diagnosis, active defrag tuning.

Week 3

Persistence & client-side

RDB↔AOF trade-off recommendation, pipeline-batching client patches for top 3 hot endpoints, replica read routing where applicable.

Week 4

Validation & handoff

Before/after report with p99 deltas per workload, runbook for the on-call team, monitoring dashboards locked.

Tuning FAQ

Cut your Valkey p99 latency dramatically

Share an INFO dump and a 24h slow-log sample. We'll come back with the top 3 wins, ranked by effort vs. impact, before you commit to a tuning engagement.

Related Valkey Services

Explore more ways our Valkey experts can help with your database infrastructure.

Valkey Consulting Valkey Migration Valkey Cluster Valkey High Availability Valkey Remote DBA Valkey 24/7 Support Valkey Kubernetes

View all Valkey services