NoSQL Databases

Cassandra Explained: A Complete Guide for Always-On, Planet-Scale Data

Master Apache Cassandra for planet-scale, always-on applications. Learn ring architecture, consistency levels, data modeling, compaction strategies, and multi-datacenter replication.

JusDB Team
February 8, 2022
7 min read
4913 views

A ride-sharing company was running their driver location tracking on PostgreSQL with a hot-standby. Peak hours meant 180,000 writes per minute across 12 cities. The standby was falling behind, the primary was at 94% CPU, and they were 60 days away from a major market expansion. They evaluated Aurora, CockroachDB, and Cassandra. They chose Cassandra, and six months later were handling 1.2 million writes per minute across 28 cities with p99 write latency under 4ms — on a three-region active-active cluster.

Cassandra is the right answer to a specific question: how do I write at massive scale, across multiple regions, with no single point of failure, and predictable low latency? If that's your question, this guide explains how Cassandra actually works, when it beats every alternative, and the operational realities of running it.

TL;DR
  • Cassandra is a masterless, eventually-consistent distributed database optimized for write-heavy workloads at massive scale
  • Data is partitioned by a primary key — partition key choice is the single most important design decision
  • Consistency is tunable per-query: QUORUM for strong reads, ONE for maximum throughput
  • Cassandra has no joins, no foreign keys, no multi-table transactions — model your data around your queries, not normalization
  • Right fit: write-heavy time-series, IoT, event streams, geo-distributed apps. Wrong fit: complex queries, frequent schema changes, relational data

Architecture: Why Cassandra Has No Single Point of Failure

Unlike MySQL or PostgreSQL, Cassandra has no primary/replica distinction. Every node in a Cassandra cluster is identical — any node can accept reads or writes for any data. This is called a masterless (or leaderless) architecture.

Data is distributed across nodes using consistent hashing. Each row's partition key is hashed to a token, and each node owns a range of tokens. When you write a row, the coordinator node (whichever node the client connected to) routes the write to all replica nodes responsible for that token range.

# Visualize token distribution across a 6-node cluster
nodetool ring

# See which nodes own a specific partition key
nodetool getendpoints mykeyspace mytable "partition_key_value"

# Cluster topology overview
nodetool status
# Output shows: node addresses, data center, rack, load, token ranges, state (UN = Up/Normal)

Cassandra uses a replication factor (RF) to determine how many nodes hold a copy of each partition. RF=3 means every row exists on 3 nodes. Combined with quorum consistency, this allows any one node to go down without data loss or availability impact.


Data Modeling: Design Around Your Queries

The most common Cassandra mistake is treating it like a relational database and designing a normalized schema. Cassandra doesn't support joins — if you need data from two tables in one query, you must denormalize and store it together.

The rule: one table per query pattern.

-- Example: storing IoT sensor readings

-- Query 1: "get the last 24 hours of readings for sensor X"
CREATE TABLE sensor_readings_by_sensor (
    sensor_id   UUID,
    recorded_at TIMESTAMP,
    temperature FLOAT,
    humidity    FLOAT,
    PRIMARY KEY (sensor_id, recorded_at)  -- sensor_id = partition key, recorded_at = clustering key
) WITH CLUSTERING ORDER BY (recorded_at DESC);

-- Query 2: "get all sensors in a facility that had readings > 30°C today"
-- This requires a SEPARATE table (Cassandra can't filter on non-partition-key columns efficiently)
CREATE TABLE hot_sensors_by_facility (
    facility_id  UUID,
    recorded_at  TIMESTAMP,
    sensor_id    UUID,
    temperature  FLOAT,
    PRIMARY KEY ((facility_id, recorded_at), sensor_id)
) WITH CLUSTERING ORDER BY (sensor_id ASC);

-- Write to BOTH tables on every insert (application-level denormalization)
-- This is expected in Cassandra — it's the cost of masterless horizontal scale

Partition key design — the most critical decision

-- BAD: low-cardinality partition key (all data for a user_type on one node)
PRIMARY KEY (user_type, user_id)  -- 3-4 user_types = 3-4 hot partitions

-- BAD: unbounded partition (one user's events forever on the same partition = partition > 100MB)
PRIMARY KEY (user_id, event_time)  -- a power user with 10M events = one huge partition

-- GOOD: time-bucketed partition key (balance data per node, bounded partition size)
PRIMARY KEY ((user_id, bucket), event_time)
-- where bucket = DATE_TRUNC('month', event_time) — each month is a separate partition

-- Check partition sizes in production
nodetool tablehistograms mykeyspace mytable
-- Look for "Partition Size" — partitions > 100MB are problematic

Consistency Levels: Tunable Per Query

Cassandra's consistency model is one of its most powerful — and most misunderstood — features. Every read and write specifies a consistency level that determines how many replicas must acknowledge the operation:

Consistency Level Reads Writes Use Case
ONE1 replica1 replicaMaximum throughput, stale reads acceptable
QUORUMmajoritymajorityStrong consistency (most common production choice)
LOCAL_QUORUMlocal DC majoritylocal DC majorityMulti-DC: strong within DC, async cross-DC
ALLall replicasall replicasMaximum consistency, lowest availability
LOCAL_ONE1 local replica1 local replicaLowest latency reads in multi-DC setup

For RF=3: QUORUM requires 2 replicas. If you write at QUORUM and read at QUORUM, you're guaranteed to always read your own writes — this is the standard strong-consistency setup for most applications.

code
-- Set consistency level in cqlsh
CONSISTENCY QUORUM;

-- Per-statement in application code (Java driver example):
Statement stmt = QueryBuilder.selectFrom("sensor_readings_by_sensor")
    .all()
    .whereColumn("sensor_id").isEqualTo(literal(sensorId))
    .build()
    .setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);

-- Python driver:
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
query = SimpleStatement(
    "SELECT * FROM sensor_readings_by_sensor WHERE sensor_id = %s",
    consistency_level=ConsistencyLevel.QUORUM
)

Write Path: Why Cassandra Writes Are So Fast

Cassandra writes are fast by design. When a write arrives:

  1. Written to the commit log (sequential append, near-zero latency on SSD)
  2. Written to the memtable (in-memory structure, immediately queryable)
  3. Acknowledged to the client — the write is done from the application's perspective
  4. Periodically, memtables are flushed to SSTables on disk (immutable files)
  5. Background compaction merges SSTables and removes tombstones (deleted data markers)
code
# Monitor write performance
nodetool tpstats | grep -E "MutationStage|ReadStage"
# Active, pending, completed, blocked, all time blocked counts

# Check compaction status
nodetool compactionstats

# Flush memtables manually (rarely needed, useful before restart)
nodetool flush mykeyspace

# View SSTable count per table (high count = compaction backlog)
nodetool tablehistograms mykeyspace sensor_readings_by_sensor

Production Operations

code
# Add a node to the cluster (Cassandra automatically streams data to new node)
# 1. Start Cassandra on the new node with correct seeds in cassandra.yaml
# 2. Watch streaming progress:
nodetool netstats

# Remove a node gracefully (streams its data to other nodes first)
nodetool decommission  # run on the node being removed

# Repair: re-synchronize replicas (run weekly on all nodes)
nodetool repair -full mykeyspace
# For large clusters, use Reaper (https://cassandra-reaper.io/) to coordinate repairs

# Backup: snapshot all SSTables
nodetool snapshot mykeyspace
# Snapshots are hard-links to SSTable files — fast, space-efficient

# Key cassandra.yaml settings for production:
# read_request_timeout_in_ms: 5000     (default: too low for analytical queries)
# write_request_timeout_in_ms: 2000
# concurrent_reads: 32                 # = 8 × number of data disks
# concurrent_writes: 32                # = 8 × number of CPU cores
# memtable_heap_space_in_mb: 2048      # tune based on available heap

When to Choose Cassandra (and When Not To)

Cassandra wins when:

  • Write throughput exceeds what a single primary can handle (>100K writes/sec)
  • You need active-active multi-region with no single point of failure
  • Data model fits time-series or event stream patterns with clear partition keys
  • You can pre-define all query patterns and denormalize accordingly

Don't use Cassandra when:

  • You need ad-hoc queries, JOINs, or complex aggregations — use PostgreSQL or ClickHouse
  • Your schema changes frequently — Cassandra DDL changes require careful coordination across all nodes
  • You need strong multi-row ACID transactions — Cassandra has lightweight transactions (LWT) but they're expensive and don't cover multi-partition operations
  • Your total dataset fits on a single well-provisioned PostgreSQL or MySQL instance — distributed overhead isn't worth it

Working with JusDB on Cassandra

The hardest part of Cassandra isn't the technology — it's the data modeling. Getting partition key design wrong means hot partitions, uneven cluster load, and query performance that never matches expectations. And unlike relational databases where you can add indexes after the fact, Cassandra data model changes often require table rewrites and dual-write periods.

We help teams evaluate whether Cassandra fits their workload, design partition key strategies for their specific query patterns, and operate Cassandra clusters in production through our database consulting practice. If you're evaluating Cassandra for a high-write workload or troubleshooting an existing cluster, reach out.

Related reading: Aerospike Explained | Redis vs Valkey | TiDB for Distributed SQL

Share this article

JusDB Team

Official JusDB content team

Deeper Reading

Curated companion guides for readers who want to go deeper on this topic.