Cassandra Explained: Architecture, Data Modeling & When to Use It

A ride-sharing company was running their driver location tracking on PostgreSQL with a hot-standby. Peak hours meant 180,000 writes per minute across 12 cities. The standby was falling behind, the primary was at 94% CPU, and they were 60 days away from a major market expansion. They evaluated Aurora, CockroachDB, and Cassandra. They chose Cassandra, and six months later were handling 1.2 million writes per minute across 28 cities with p99 write latency under 4ms — on a three-region active-active cluster.

Cassandra is the right answer to a specific question: how do I write at massive scale, across multiple regions, with no single point of failure, and predictable low latency? If that's your question, this guide explains how Cassandra actually works, when it beats every alternative, and the operational realities of running it.

TL;DR

Cassandra is a masterless, eventually-consistent distributed database optimized for write-heavy workloads at massive scale
Data is partitioned by a primary key — partition key choice is the single most important design decision
Consistency is tunable per-query: QUORUM for strong reads, ONE for maximum throughput
Cassandra has no joins, no foreign keys, no multi-table transactions — model your data around your queries, not normalization
Right fit: write-heavy time-series, IoT, event streams, geo-distributed apps. Wrong fit: complex queries, frequent schema changes, relational data

Architecture: Why Cassandra Has No Single Point of Failure

Unlike MySQL or PostgreSQL, Cassandra has no primary/replica distinction. Every node in a Cassandra cluster is identical — any node can accept reads or writes for any data. This is called a masterless (or leaderless) architecture.

Data is distributed across nodes using consistent hashing. Each row's partition key is hashed to a token, and each node owns a range of tokens. When you write a row, the coordinator node (whichever node the client connected to) routes the write to all replica nodes responsible for that token range.

text

# Visualize token distribution across a 6-node cluster
nodetool ring

# See which nodes own a specific partition key
nodetool getendpoints mykeyspace mytable "partition_key_value"

# Cluster topology overview
nodetool status
# Output shows: node addresses, data center, rack, load, token ranges, state (UN = Up/Normal)

Cassandra uses a replication factor (RF) to determine how many nodes hold a copy of each partition. RF=3 means every row exists on 3 nodes. Combined with quorum consistency, this allows any one node to go down without data loss or availability impact.

Data Modeling: Design Around Your Queries

The most common Cassandra mistake is treating it like a relational database and designing a normalized schema. Cassandra doesn't support joins — if you need data from two tables in one query, you must denormalize and store it together.

The rule: one table per query pattern.

text

-- Example: storing IoT sensor readings

-- Query 1: "get the last 24 hours of readings for sensor X"
CREATE TABLE sensor_readings_by_sensor (
    sensor_id   UUID,
    recorded_at TIMESTAMP,
    temperature FLOAT,
    humidity    FLOAT,
    PRIMARY KEY (sensor_id, recorded_at)  -- sensor_id = partition key, recorded_at = clustering key
) WITH CLUSTERING ORDER BY (recorded_at DESC);

-- Query 2: "get all sensors in a facility that had readings > 30°C today"
-- This requires a SEPARATE table (Cassandra can't filter on non-partition-key columns efficiently)
CREATE TABLE hot_sensors_by_facility (
    facility_id  UUID,
    recorded_at  TIMESTAMP,
    sensor_id    UUID,
    temperature  FLOAT,
    PRIMARY KEY ((facility_id, recorded_at), sensor_id)
) WITH CLUSTERING ORDER BY (sensor_id ASC);

-- Write to BOTH tables on every insert (application-level denormalization)
-- This is expected in Cassandra — it's the cost of masterless horizontal scale

Partition key design — the most critical decision

text

-- BAD: low-cardinality partition key (all data for a user_type on one node)
PRIMARY KEY (user_type, user_id)  -- 3-4 user_types = 3-4 hot partitions

-- BAD: unbounded partition (one user's events forever on the same partition = partition > 100MB)
PRIMARY KEY (user_id, event_time)  -- a power user with 10M events = one huge partition

-- GOOD: time-bucketed partition key (balance data per node, bounded partition size)
PRIMARY KEY ((user_id, bucket), event_time)
-- where bucket = DATE_TRUNC('month', event_time) — each month is a separate partition

-- Check partition sizes in production
nodetool tablehistograms mykeyspace mytable
-- Look for "Partition Size" — partitions > 100MB are problematic

Consistency Levels: Tunable Per Query

Cassandra's consistency model is one of its most powerful — and most misunderstood — features. Every read and write specifies a consistency level that determines how many replicas must acknowledge the operation:

Consistency Level	Reads	Writes	Use Case
`ONE`	1 replica	1 replica	Maximum throughput, stale reads acceptable
`QUORUM`	majority	majority	Strong consistency (most common production choice)
`LOCAL_QUORUM`	local DC majority	local DC majority	Multi-DC: strong within DC, async cross-DC
`ALL`	all replicas	all replicas	Maximum consistency, lowest availability
`LOCAL_ONE`	1 local replica	1 local replica	Lowest latency reads in multi-DC setup

For RF=3: QUORUM requires 2 replicas. If you write at QUORUM and read at QUORUM, you're guaranteed to always read your own writes — this is the standard strong-consistency setup for most applications.

text

-- Set consistency level in cqlsh
CONSISTENCY QUORUM;

-- Per-statement in application code (Java driver example):
Statement stmt = QueryBuilder.selectFrom("sensor_readings_by_sensor")
    .all()
    .whereColumn("sensor_id").isEqualTo(literal(sensorId))
    .build()
    .setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);

-- Python driver:
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
query = SimpleStatement(
    "SELECT * FROM sensor_readings_by_sensor WHERE sensor_id = %s",
    consistency_level=ConsistencyLevel.QUORUM
)

Write Path: Why Cassandra Writes Are So Fast

Cassandra writes are fast by design. When a write arrives:

Written to the commit log (sequential append, near-zero latency on SSD)
Written to the memtable (in-memory structure, immediately queryable)
Acknowledged to the client — the write is done from the application's perspective
Periodically, memtables are flushed to SSTables on disk (immutable files)
Background compaction merges SSTables and removes tombstones (deleted data markers)

text

# Monitor write performance
nodetool tpstats | grep -E "MutationStage|ReadStage"
# Active, pending, completed, blocked, all time blocked counts

# Check compaction status
nodetool compactionstats

# Flush memtables manually (rarely needed, useful before restart)
nodetool flush mykeyspace

# View SSTable count per table (high count = compaction backlog)
nodetool tablehistograms mykeyspace sensor_readings_by_sensor

Production Operations

text

# Add a node to the cluster (Cassandra automatically streams data to new node)
# 1. Start Cassandra on the new node with correct seeds in cassandra.yaml
# 2. Watch streaming progress:
nodetool netstats

# Remove a node gracefully (streams its data to other nodes first)
nodetool decommission  # run on the node being removed

# Repair: re-synchronize replicas (run weekly on all nodes)
nodetool repair -full mykeyspace
# For large clusters, use Reaper (https://cassandra-reaper.io/) to coordinate repairs

# Backup: snapshot all SSTables
nodetool snapshot mykeyspace
# Snapshots are hard-links to SSTable files — fast, space-efficient

# Key cassandra.yaml settings for production:
# read_request_timeout_in_ms: 5000     (default: too low for analytical queries)
# write_request_timeout_in_ms: 2000
# concurrent_reads: 32                 # = 8 × number of data disks
# concurrent_writes: 32                # = 8 × number of CPU cores
# memtable_heap_space_in_mb: 2048      # tune based on available heap

When to Choose Cassandra (and When Not To)

Cassandra wins when:

Write throughput exceeds what a single primary can handle (>100K writes/sec)
You need active-active multi-region with no single point of failure
Data model fits time-series or event stream patterns with clear partition keys
You can pre-define all query patterns and denormalize accordingly

Don't use Cassandra when:

You need ad-hoc queries, JOINs, or complex aggregations — use PostgreSQL or ClickHouse
Your schema changes frequently — Cassandra DDL changes require careful coordination across all nodes
You need strong multi-row ACID transactions — Cassandra has lightweight transactions (LWT) but they're expensive and don't cover multi-partition operations
Your total dataset fits on a single well-provisioned PostgreSQL or MySQL instance — distributed overhead isn't worth it

Working with JusDB on Cassandra

The hardest part of Cassandra isn't the technology — it's the data modeling. Getting partition key design wrong means hot partitions, uneven cluster load, and query performance that never matches expectations. And unlike relational databases where you can add indexes after the fact, Cassandra data model changes often require table rewrites and dual-write periods.

We help teams evaluate whether Cassandra fits their workload, design partition key strategies for their specific query patterns, and operate Cassandra clusters in production through our database consulting practice. If you're evaluating Cassandra for a high-write workload or troubleshooting an existing cluster, reach out.

Related reading: Aerospike Explained | Redis vs Valkey | TiDB for Distributed SQL

Cassandra Explained: A Complete Guide for Always-On, Planet-Scale Data

Architecture: Why Cassandra Has No Single Point of Failure

Data Modeling: Design Around Your Queries

Partition key design — the most critical decision

Consistency Levels: Tunable Per Query

Write Path: Why Cassandra Writes Are So Fast

Production Operations

When to Choose Cassandra (and When Not To)

Working with JusDB on Cassandra

Share this article

JusDB Team

Deeper Reading

Cassandra Compaction Strategies: STCS, LCS & TWCS Explained

Cassandra Data Modeling: Partition Keys & Query Patterns

ScyllaDB vs Apache Cassandra: Performance Comparison

Need Expert Help?

MySQL High Availability

PostgreSQL High Availability

MSSQL High Availability

Cassandra High Availability

Cassandra on Kubernetes

Cassandra Consulting