ClickHouse MergeTree Engine: Storage Architecture & Optimization Guide

ClickHouse's MergeTree engine is why ClickHouse queries 1 billion rows in 0.3 seconds. Understanding how MergeTree stores, merges, and queries data isn't optional if you're running ClickHouse in production — the wrong table design multiplies your storage costs and kills query performance simultaneously.

TL;DR

MergeTree stores data in sorted parts that merge asynchronously — writes are always fast, queries use sparse indexes
The PRIMARY KEY controls data sorting and pruning; ORDER BY and PRIMARY KEY are often the same but can differ
ReplicatedMergeTree adds automatic replication without a separate replication layer
Choose the right MergeTree variant: SummingMergeTree for pre-aggregation, ReplacingMergeTree for deduplication

How MergeTree Storage Works

Parts and Merges

Every INSERT to a MergeTree table creates an immutable data part directory. ClickHouse merges these parts in the background, sorting rows by the ORDER BY key. The sparse primary index marks granule boundaries (every 8192 rows by default), enabling range queries to skip irrelevant granules entirely.

sql

-- Basic MergeTree table
CREATE TABLE events
(
    event_date   Date,
    event_ts     DateTime64(3),
    user_id      UInt64,
    event_type   LowCardinality(String),
    revenue_usd  Decimal(10, 2),
    properties   Map(String, String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id, event_ts)
SETTINGS index_granularity = 8192;

Choosing the ORDER BY Key

The ORDER BY key is the most critical design decision in ClickHouse. Queries that filter on high-cardinality columns in the ORDER BY prefix can skip most of the data. Queries that filter on columns NOT in the ORDER BY require full scans or secondary indexes.

sql

-- GOOD: queries filter on event_date and user_id (first two ORDER BY columns)
SELECT count(), sum(revenue_usd)
FROM events
WHERE event_date BETWEEN '2025-01-01' AND '2025-01-31'
  AND user_id = 12345;
-- ClickHouse skips all granules outside this date+user range

-- BAD: filtering on a column not in ORDER BY prefix
SELECT count() FROM events
WHERE event_type = 'purchase';  -- full scan required
-- Fix: add event_type to ORDER BY, or use a secondary index

Secondary Indexes (Skipping Indexes)

sql

-- Add a bloom filter index for low-selectivity columns
ALTER TABLE events
ADD INDEX idx_event_type (event_type)
TYPE bloom_filter(0.01)
GRANULARITY 4;

-- minmax index for range queries on non-key columns
ALTER TABLE events
ADD INDEX idx_revenue (revenue_usd)
TYPE minmax
GRANULARITY 1;

-- Materialize the index on existing data
ALTER TABLE events MATERIALIZE INDEX idx_event_type;

MergeTree Variants for Common Patterns

ReplicatedMergeTree

sql

-- Replicated table (uses ZooKeeper/ClickHouse Keeper for coordination)
CREATE TABLE events ON CLUSTER my_cluster
(
    event_date   Date,
    user_id      UInt64,
    event_type   String,
    revenue_usd  Decimal(10,2)
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/events',  -- ZooKeeper path
    '{replica}'                            -- Replica name (from macros)
)
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id);

SummingMergeTree for Pre-Aggregation

sql

-- Pre-aggregate revenue by user and day
CREATE TABLE user_daily_revenue
(
    event_date   Date,
    user_id      UInt64,
    revenue_usd  Decimal(10,2),  -- summed during merge
    event_count  UInt64          -- summed during merge
)
ENGINE = SummingMergeTree()
ORDER BY (event_date, user_id);

-- INSERT raw data; ClickHouse sums during background merge
INSERT INTO user_daily_revenue VALUES ('2025-01-15', 42, 19.99, 1);
INSERT INTO user_daily_revenue VALUES ('2025-01-15', 42, 49.99, 1);
-- After merge: single row with revenue=69.98, count=2

-- Query must use sum() to handle pre-merge state
SELECT user_id, sum(revenue_usd), sum(event_count)
FROM user_daily_revenue
WHERE event_date = '2025-01-15'
GROUP BY user_id;

Warning

Merges in ClickHouse happen asynchronously. Until parts merge, SummingMergeTree can have duplicate rows. Always use sum() aggregation in queries against SummingMergeTree — never SELECT *.

Partition Management

sql

-- Drop a partition (instant, no row-by-row delete)
ALTER TABLE events DROP PARTITION 202412;

-- Detach (move to separate storage, reattach later)
ALTER TABLE events DETACH PARTITION 202411;

-- Move partition to cheaper storage tier
ALTER TABLE events MOVE PARTITION 202409 TO VOLUME 'cold_storage';

-- Check partition sizes
SELECT
    partition,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE table = 'events' AND active = 1
GROUP BY partition
ORDER BY partition DESC;

Key Takeaways

The ORDER BY key controls data sorting and query pruning — it is the single most important design decision for a ClickHouse table.
Secondary (skipping) indexes with bloom filters accelerate queries on columns not in the ORDER BY prefix.
SummingMergeTree and ReplacingMergeTree eliminate the need for external aggregation and deduplication pipelines.
Partition management with DROP/DETACH PARTITION is instant — use daily or monthly partitions to enable cheap data retention policies.

Working with JusDB on ClickHouse

JusDB designs ClickHouse table schemas, tunes MergeTree configurations, and optimizes ORDER BY key selection for engineering teams building analytics pipelines. We ensure your ClickHouse deployment delivers sub-second query performance at TB scale without over-provisioning hardware.

Explore JusDB Analytics Services → | Talk to a DBA

Related reading:

ClickHouse MergeTree Engine: Storage Architecture and Query Optimization

How MergeTree Storage Works

Parts and Merges

Choosing the ORDER BY Key

Secondary Indexes (Skipping Indexes)

MergeTree Variants for Common Patterns

ReplicatedMergeTree

SummingMergeTree for Pre-Aggregation

Partition Management

Working with JusDB on ClickHouse

Share this article

JusDB Team

Keep reading

TimescaleDB Hypertables, Continuous Aggregates & Compression (2026 Production Guide)

ClickHouse Explained (2026): MergeTree, Distributed Engine & Real-Time OLAP

StarRocks vs ClickHouse: Architecture, Table Models & When to Choose Each (2026)

Need Expert Help?

MySQL Performance Tuning

MySQL Consulting

MySQL Migration

MySQL High Availability

MySQL on Kubernetes

MySQL Support