Cassandra Compaction Strategies

Cassandra's compaction strategy is one of the most impactful tuning decisions you'll make for a production cluster — get it wrong and you're looking at read latency spikes, runaway disk usage, or cascading compaction backlogs that degrade the entire node. Unlike most database internals, compaction in Cassandra is not a background afterthought; it is the primary mechanism by which the storage engine maintains read performance and reclaims space. Most teams inherit a default strategy (STCS) and never revisit it, quietly paying a performance tax every day. Choosing the right compaction strategy for your actual access pattern can cut read latency in half, slash disk amplification, and eliminate tombstone-related query timeouts.

TL;DR

Compaction merges SSTables, removes tombstones, and enforces the storage engine's sorted order on disk.
STCS (Size-Tiered) is the default — optimized for write-heavy workloads but causes high space amplification.
LCS (Leveled) keeps SSTables in size-bounded levels for predictable reads, at the cost of higher write I/O.
TWCS (Time-Window) is purpose-built for time-series data with TTL — it never mixes time buckets, enabling clean deletion with no read overhead.
UCS (Unified, Cassandra 4.1+) adapts between strategies automatically using a single tuning knob.
You can change strategy live with ALTER TABLE — no downtime required.

What Is Compaction and Why Does It Matter?

Cassandra uses a Log-Structured Merge (LSM) storage engine. Every write lands in the in-memory Memtable and is eventually flushed to an immutable on-disk file called an SSTable (Sorted String Table). Because SSTables are immutable, updates and deletes do not overwrite data in place — instead, Cassandra writes a new version of a row (for updates) or a tombstone marker (for deletes) into a new SSTable.

Over time, the number of SSTables on disk grows. A single logical row may now exist across dozens of SSTables at different timestamps. To read that row, Cassandra must open and merge all relevant SSTables — an expensive operation called a read amplification cost. Compaction is the background process that merges SSTables, applies conflict resolution (last-write-wins), purges expired tombstones, and rewrites data into a smaller set of files. Without compaction, read performance degrades indefinitely and tombstones accumulate, eventually triggering TombstoneOverwhelmingException errors.

There are three main compaction axes to understand:

Read amplification — how many SSTables must be consulted for a single read.
Write amplification — how many times data is rewritten during compaction relative to the original write volume.
Space amplification — how much extra disk space is consumed by redundant data pending compaction.

Different strategies make deliberate trade-offs across these three dimensions.

Size-Tiered Compaction Strategy (STCS)

STCS is Cassandra's default strategy and the easiest to understand. It groups SSTables into size tiers and triggers a compaction when enough SSTables of similar size accumulate in the same tier.

How It Works

When the number of SSTables in a size tier reaches min_threshold (default: 4), Cassandra compacts them together into a single, larger SSTable. If a tier holds more SSTables than max_threshold (default: 32), compaction is capped to prevent runaway merges. The result of each compaction is a larger SSTable that moves into the next tier up.

text

-- Enable STCS (default, shown explicitly)
ALTER TABLE my_keyspace.events
WITH compaction = {
  'class': 'SizeTieredCompactionStrategy',
  'min_threshold': 4,
  'max_threshold': 32,
  'min_sstable_size': 50
};

When to Use STCS

Write-heavy workloads where raw ingest throughput matters most.
Workloads where data is written once and rarely re-read (event logging, audit trails).
Scenarios where disk space is abundant and space amplification is acceptable.

STCS Trade-offs

Warning

STCS can temporarily consume up to 2x the size of the largest tier in extra disk space during a compaction run. On a node that is already 80% full, this can cause a full disk failure. Always leave at least 50% free space when running STCS on large datasets.

Pros: Lowest write amplification, highest ingest throughput, simple to tune.
Cons: High read amplification on cold data, high space amplification, tombstone accumulation between compaction cycles.

Leveled Compaction Strategy (LCS)

LCS was designed specifically to reduce read amplification. Instead of size tiers, it organizes SSTables into discrete levels (L0, L1, L2, ...) where each level is approximately 10x larger than the previous. SSTables within a level have non-overlapping key ranges, which means a point read needs to check at most one SSTable per level.

How It Works

New SSTables land in L0. When L0 accumulates files, they are compacted into L1, where each SSTable is bounded to sstable_size_in_mb (default: 160 MB). L1 SSTables are compacted into L2 when L1 exceeds its size budget, and so on. Each level's budget is 10x the previous level.

text

-- Enable LCS
ALTER TABLE my_keyspace.products
WITH compaction = {
  'class': 'LeveledCompactionStrategy',
  'sstable_size_in_mb': 160,
  'tombstone_threshold': 0.2,
  'tombstone_compaction_interval': 86400
};

When to Use LCS

Read-heavy workloads, especially those doing many random point reads or range scans.
Mixed read/write workloads where read latency SLOs are strict.
Tables with lots of updates to existing rows (LCS limits the overlap between SSTables that must be merged).

Tip

LCS provides guaranteed bounded read amplification: reads check at most one SSTable per level. For a typical L5 cluster, that means at most 5 SSTable reads for any point lookup, regardless of data volume.

LCS Trade-offs

Pros: Predictable, low read latency; excellent for read-heavy and update-heavy workloads; better tombstone expiry.
Cons: Significantly higher write I/O (write amplification can be 10x or more); sustained high compaction CPU usage; not suitable for ingest-heavy workloads.

Time-Window Compaction Strategy (TWCS)

TWCS is the recommended strategy for time-series data with TTL. It partitions SSTables into time buckets based on when the data was written, and only compacts SSTables within the same time window together. Once a time window is closed (i.e., no more writes will fall into it), the SSTables in that window are compacted into a single SSTable that will eventually be dropped wholesale when all its TTLs expire.

How It Works

The two primary knobs are compaction_window_unit and compaction_window_size, which together define the bucket duration. Within each open window, STCS-style compaction applies. Closed windows are compacted once and then left alone until expiry.

text

-- Enable TWCS for a metrics table with 30-day TTL
ALTER TABLE my_keyspace.metrics
WITH compaction = {
  'class': 'TimeWindowCompactionStrategy',
  'compaction_window_unit': 'HOURS',
  'compaction_window_size': 1,
  'tombstone_threshold': 0.1,
  'tombstone_compaction_interval': 3600
}
AND default_time_to_live = 2592000;

Warning

TWCS relies on the assumption that writes always land in the current time window. If your application writes data with timestamps from the past (out-of-order writes or late arrivals), TWCS will mix old data into new windows, preventing clean SSTable expiry and causing tombstone buildup. Use LCS or STCS instead if you have significant out-of-order writes.

When to Use TWCS

Time-series metrics, IoT sensor data, log ingestion — any table with a TTL and append-only writes.
Workloads where rows are written once and read in recent time windows.
Tables where you want zero-read-cost deletion (entire SSTables dropped at TTL expiry).

Sizing the Window

A good rule of thumb is to set the window size so that each closed window SSTable is between 1 GB and 5 GB. If your windows are too small, you create too many SSTables; too large, and tombstone expiry is delayed. Match the window size to your query lookback pattern — if users query the last 1 hour, use 1-hour windows so hot reads hit only 1-2 SSTables.

Unified Compaction Strategy (UCS) — Cassandra 4.1+

Introduced in Apache Cassandra 4.1, UCS is a single generalized strategy that subsumes STCS and LCS behavior under one tuning knob: scaling_parameters. At a high level, a negative scaling parameter approaches LCS behavior (aggressive leveling, low read amplification) while a positive value approaches STCS behavior (size-tiering, low write amplification). Setting it to zero produces a hybrid.

text

-- Enable UCS (Cassandra 4.1+)
ALTER TABLE my_keyspace.orders
WITH compaction = {
  'class': 'UnifiedCompactionStrategy',
  'scaling_parameters': 'L10',
  'target_sstable_size': '100MiB',
  'max_sstables_to_compact': 32
};

UCS also supports per-level scaling parameters, allowing you to apply LCS behavior at the bottom levels (where reads are most sensitive) and STCS behavior at the top levels (to absorb write bursts without excessive I/O). For new Cassandra 4.1+ deployments, UCS is worth evaluating before committing to one of the legacy strategies.

Choosing the Right Strategy

The decision tree is straightforward once you classify your workload:

Workload Pattern	Recommended Strategy	Primary Reason
Write-heavy, infrequent reads	STCS	Lowest write amplification
Read-heavy or update-heavy	LCS	Bounded read amplification per level
Time-series with TTL, append-only	TWCS	Whole-SSTable expiry, no tombstone I/O
Mixed / unknown, on Cassandra 4.1+	UCS	Single tunable, adapts to access pattern
Large cold dataset, rare reads	STCS	No compaction overhead on idle data

Changing strategy on an existing table is non-disruptive. The new strategy takes effect immediately for future compaction rounds; existing SSTables are gradually rewritten over time.

text

-- Switch from STCS to LCS with no downtime
ALTER TABLE my_keyspace.user_profiles
WITH compaction = {
  'class': 'LeveledCompactionStrategy',
  'sstable_size_in_mb': 160
};

Monitoring Compaction Health

The primary tool for inspecting compaction status is nodetool compactionstats. Run it on any node to see in-progress compactions, the number of pending tasks, and throughput estimates.

text

# Real-time compaction queue
nodetool compactionstats

# Example output (abbreviated):
pending tasks: 47
- keyspace=my_keyspace, table=metrics, completed=3.4 GB, total=12.1 GB, unit=bytes, progress=28.10%

# Per-table compaction history
nodetool compactionhistory | grep my_keyspace

Key metrics to monitor in production:

Pending compaction tasks — if this number grows continuously, your compaction throughput cannot keep up with write ingest. Consider throttling writes or increasing concurrent_compactors in cassandra.yaml.
SSTable count per table — use nodetool tablestats and look at the SSTable count field. Under STCS, a count above 20 indicates compaction lag. Under LCS, L0 SSTable count above 4 is a warning sign.
Tombstone live ratio — nodetool tablestats reports Average live cells per slice (last five minutes) and Maximum tombstones per slice. Values above 100,000 tombstones per slice are a compaction problem masquerading as a query problem.

text

# SSTable count and tombstone metrics per table
nodetool tablestats my_keyspace.metrics

# Trigger a manual compaction (use sparingly in production)
nodetool compact my_keyspace metrics

Tip

You can throttle compaction throughput to protect write performance during peak hours using nodetool setcompactionthroughput <MB/s>. Setting it to 0 removes the throttle entirely. A value of 64 (64 MB/s) is a common starting point for balanced clusters.

Tombstone Handling by Strategy

Each strategy handles tombstone expiry differently, and this has significant operational consequences:

STCS — tombstones may sit in large SSTables for a long time before a compaction that covers the full key range occurs. Use tombstone_compaction_interval to force periodic single-SSTable compactions that drop obvious tombstones.
LCS — because all SSTables in a level have non-overlapping ranges, tombstones are resolved during every leveled merge. Tombstone expiry is much faster and more predictable than STCS.
TWCS — tombstones in closed windows are resolved when the entire SSTable expires. For workloads with TTL this is optimal; for workloads with explicit deletes in older windows, tombstones may never be purged cleanly.

Key Takeaways

Cassandra's compaction strategy is the primary lever for balancing read amplification, write amplification, and space amplification — there is no universally "best" choice.
Use STCS for write-dominated workloads where disk space is plentiful and reads are infrequent or latency-tolerant.
Use LCS when your application demands consistent, low read latency and you can afford higher sustained write I/O.
Use TWCS for time-series tables with TTL and append-only writes — it delivers the most efficient tombstone cleanup by dropping entire SSTables at expiry.
Use UCS (Cassandra 4.1+) when you want a single strategy that adapts across workload patterns via a unified scaling_parameters knob.
ALTER TABLE ... WITH compaction = {...} changes strategy live with zero downtime.
Monitor nodetool compactionstats and nodetool tablestats regularly — compaction lag is one of the most common root causes of unexpected read latency in production Cassandra clusters.
Match your TWCS window size to your query lookback window, and ensure your window SSTable sizes land between 1 GB and 5 GB for clean expiry behavior.

Optimize Your Cassandra Cluster with JusDB

Choosing the right compaction strategy is only one dimension of Cassandra performance tuning. JusDB provides deep, query-level visibility into your Cassandra clusters — surfacing compaction lag, tombstone hotspots, SSTable amplification per table, and misconfigured strategies before they become incidents. Instead of piecing together nodetool output across dozens of nodes, JusDB aggregates the signal you need in one place.

Whether you're migrating a legacy STCS table to LCS, diagnosing TWCS window misconfiguration in a metrics pipeline, or evaluating UCS for a new Cassandra 4.1 deployment, JusDB gives your team the data to make those decisions with confidence — and to verify the results after the change.

Start a free trial with JusDB and get Cassandra compaction health monitoring running in under 15 minutes.

Cassandra Compaction Strategies: STCS, LCS, and TWCS Explained

What Is Compaction and Why Does It Matter?

Size-Tiered Compaction Strategy (STCS)

How It Works

When to Use STCS

STCS Trade-offs

Leveled Compaction Strategy (LCS)

How It Works

When to Use LCS

LCS Trade-offs

Time-Window Compaction Strategy (TWCS)

How It Works

When to Use TWCS

Sizing the Window

Unified Compaction Strategy (UCS) — Cassandra 4.1+

Choosing the Right Strategy

Monitoring Compaction Health

Tombstone Handling by Strategy

Optimize Your Cassandra Cluster with JusDB

Share this article

Keep reading

High Performance with MongoDB: A Top-Down Tuning Guide

MongoDB Explained (2026): Replica Sets, Sharding, Atlas & Production Patterns

ScyllaDB vs Apache Cassandra: Performance and Operational Differences

Need Expert Help?

Database Performance Optimization

Cassandra High Availability

Cassandra on Kubernetes

Cassandra Consulting