MySQL JSON Functions: JSON_EXTRACT, JSON_SET & Multi-Valued Indexes

In early 2025, a growth-stage fintech company came to JusDB stuck between two choices for their new real-time analytics platform. Their data team had prototyped the event pipeline in BigQuery because every data scientist already knew SQL and the serverless model meant zero infrastructure to manage on day one. Their infrastructure team had benchmarked ClickHouse on a single 32-core box and seen sub-100ms query times over 2 billion rows — results their BigQuery prototype couldn't match at any price point they were willing to pay. Both teams were right, and neither was talking to the other. The real question wasn't which database was faster in a benchmark — it was which one fit how their team would actually operate it in production, at the data volumes and query patterns they had today and would have in eighteen months.

This guide breaks down the architectural differences, performance profiles, cost models, and operational realities of ClickHouse and BigQuery in 2026, so you can make that call with real information instead of vendor marketing.

TL;DR

ClickHouse is a self-hosted (or ClickHouse Cloud) columnar database built on the MergeTree engine family — extremely fast on hot data, but requires you to own infrastructure sizing, sharding, and replication topology.
BigQuery is a fully serverless columnar warehouse on Google Cloud — no infrastructure to manage, but cold-start latency and per-TB scan pricing create unpredictable costs as data volumes grow.
ClickHouse consistently delivers sub-second query latency on hot datasets; BigQuery's slot-based compute model introduces variable latency that worsens under concurrent load on on-demand pricing.
BigQuery's on-demand model charges $6.25/TB scanned — a single poorly partitioned query on a 10 TB table costs $62.50 per run, every run.
ClickHouse integrates natively with Kafka, dbt, Airbyte, and Grafana; BigQuery's deepest integrations are Looker, dbt, and the broader Google Cloud ecosystem.
Choose ClickHouse for real-time operational analytics with tight latency SLAs; choose BigQuery for ad-hoc business intelligence and data science workloads where operational simplicity outweighs cost predictability.

Background

ClickHouse was open-sourced by Yandex in 2016 after being built internally to power Yandex.Metrica, which at the time processed over 25 billion rows of clickstream data per day. The design goal was unambiguous: columnar storage optimized for analytical reads over immutable or append-only data, with the ability to run aggregation queries over billions of rows in milliseconds on commodity hardware. Yandex Metrica needed to answer "how many unique visitors from Germany used the search bar on a mobile device in the last 7 days?" in under a second, against a dataset that grew by hundreds of millions of rows per hour. ClickHouse was the answer.

BigQuery launched in 2011 as Google's commercial offering of Dremel, the internal query engine described in a 2010 research paper. Dremel's architectural insight was separating storage from compute entirely — data sits in Google's distributed storage layer (Colossus), and compute (Dremel execution nodes) is allocated on-demand per query. This meant Google could offer a query engine where you never provision a cluster, never tune memory allocation, and pay only for what you scan. The tradeoff is that compute allocation has latency, and that latency shows up as query warm-up time.

In 2026, both systems have matured significantly. ClickHouse Inc. (the commercial entity behind ClickHouse Cloud) has added managed cloud hosting, S3-backed shared storage, and a marketplace of integrations. Google has iterated BigQuery with BigQuery Omni (cross-cloud queries), BigQuery ML, and flat-rate reservation slots that replace on-demand pricing for predictable workloads. The fundamental architectural differences, however, remain.

Architecture Differences

ClickHouse: MergeTree and the Column Store

ClickHouse stores data in a family of table engines collectively called MergeTree. When rows are inserted, ClickHouse writes them in sorted order into immutable parts on disk. Background merge threads continuously combine smaller parts into larger ones — similar to LSM-tree compaction, but optimized for analytical reads rather than write throughput. Each part contains per-column files, sparse primary key indexes, and secondary skip indexes (bloom filters, min-max indexes, set indexes) that allow queries to skip reading entire granules of data without touching the actual column files.

The practical result: a query like SELECT count(), sum(revenue) FROM events WHERE user_country = 'DE' AND event_date >= '2025-01-01' over 10 billion rows will typically execute in under 500ms on a single ClickHouse node with adequate RAM, because ClickHouse reads only the user_country, event_date, and revenue columns (three out of potentially 50), skips granules eliminated by the sparse index on event_date, and applies the user_country bloom filter to skip additional blocks before reading the revenue column.

Horizontal scaling in ClickHouse uses shards and replicas. A shard is a horizontal partition of data — rows are distributed across shards by a sharding key you define (typically a hash of the primary key or a tenant ID). Each shard can have one or more replicas for redundancy. Queries fan out to all shards in parallel, and each shard processes its local data independently. ClickHouse Keeper (or ZooKeeper) coordinates replication and distributed DDL. This architecture gives you near-linear read scalability, but it requires you to design the sharding key carefully — a poor choice creates hot shards that bottleneck the entire cluster.

Warning

Choosing the wrong ClickHouse sharding key is the most common scaling mistake. If your primary query filter is tenant_id but you shard by user_id, every query touches every shard and you lose all locality benefits. Design sharding keys to match your dominant query pattern, not your insert pattern.

BigQuery: Serverless Slots and Capacitor Storage

BigQuery stores data in Capacitor, Google's proprietary columnar format optimized for nested and repeated fields (a reflection of Protocol Buffer data models prevalent inside Google). Storage is completely decoupled from compute — data lives in Colossus indefinitely at $0.02/GB/month, and compute (called slots) is allocated per query from a shared pool. A slot is approximately one vCPU of compute. Complex queries can consume thousands of slots simultaneously, which is why BigQuery can scan terabytes in seconds when the slot pool is available.

On the on-demand pricing model, BigQuery allocates slots from Google's shared pool on a best-effort basis. Under low concurrency, query start-to-first-byte latency is typically 1–5 seconds regardless of data size (this is the cold-start overhead of slot allocation). Under high concurrency, if the shared pool is exhausted, queries queue. This is the key BigQuery latency characteristic that surprises teams coming from ClickHouse: it is not data volume that determines latency, it is slot availability. A 100 GB query and a 10 TB query can have the same 3-second start latency if both get similar slot allocations from the pool.

BigQuery's flat-rate reservations (now called BigQuery editions in 2026) let you pre-purchase slot capacity — Standard, Enterprise, and Enterprise Plus tiers at different per-slot-hour rates. With reserved slots, you eliminate queue latency and get predictable cost, but you now own capacity planning, which begins to resemble the operational burden you were trying to avoid by choosing a serverless system.

Tip

If your BigQuery workload runs more than ~120 hours of slot-hours per month on the on-demand tier, a flat-rate reservation will almost certainly be cheaper. Use the BigQuery Slot Recommender in the Google Cloud console to calculate the crossover point for your actual workload.

Feature Comparison

SQL Dialect

BigQuery uses Standard SQL, which is ANSI SQL-compliant with Google extensions for nested/repeated fields, geographic functions, and ML inference. Teams familiar with PostgreSQL or Redshift SQL will find BigQuery Standard SQL comfortable. ClickHouse has its own SQL dialect that differs from ANSI in several meaningful ways: array functions, lambda expressions, materialized views with TO syntax, and aggregation combinators like uniqCombined, quantileTDigest, and groupArray are ClickHouse-specific. ClickHouse SQL is expressive and powerful, but it has a real learning curve for analysts accustomed to standard SQL.

Real-Time Ingestion

ClickHouse has native Kafka integration via the Kafka table engine and MaterializedView — you can ingest from Kafka topics directly into MergeTree tables with sub-second latency, no external pipeline required. This makes ClickHouse a natural fit for event-driven architectures where data must be queryable within seconds of being produced. BigQuery's native streaming insert API exists but has historically been expensive ($0.01/200MB inserted) and has data freshness of up to 90 seconds for some query types. BigQuery's recommended real-time pattern today uses Pub/Sub + Dataflow (or Kafka Connect + Storage Write API), which adds operational components and seconds of latency.

Integrations

Both systems have mature integration ecosystems. ClickHouse has first-party connectors for Kafka, dbt (via dbt-clickhouse), Airbyte, Grafana (with the ClickHouse data source plugin), and Superset. BigQuery's deepest integrations are Looker (Google acquired Looker in 2019 and it is now the recommended BigQuery BI layer), dbt (via dbt-bigquery), Fivetran, and the broader Google Cloud ecosystem (Vertex AI, Dataflow, Looker Studio). Both work with Airbyte.

Performance

Raw benchmark comparisons between ClickHouse and BigQuery are widely available (ClickBench is the canonical benchmark for columnar analytical systems), but production performance depends heavily on workload characteristics, not synthetic benchmarks. The key differences in practice:

ClickHouse on hot data — when the working set fits in the OS page cache (or ClickHouse's own mark cache), query latency on aggregations over hundreds of millions of rows is typically in the 50–500ms range. ClickHouse is genuinely designed for interactive analytical queries where users expect sub-second responses.

BigQuery on ad-hoc queries — the 1–5 second cold-start overhead is unavoidable on on-demand pricing. For batch reporting (nightly dashboards, weekly business reviews), this overhead is irrelevant. For interactive dashboards where a user clicks a filter and expects immediate results, 3 seconds of slot allocation latency before the query even starts is a user experience problem.

ClickHouse under write pressure — heavy concurrent inserts on a single ClickHouse node create merge pressure. If inserts outpace the background merge process, you accumulate too many parts, which slows queries and eventually triggers the "too many parts" error. This requires insert buffering, insert batching, or the Buffer table engine. BigQuery's serverless model never exposes this problem — storage and compute are always separate, and inserts are handled by Google's infrastructure.

Important

ClickHouse's "too many parts" error (DB::Exception: Too many parts (N). Merges are processing significantly slower than inserts) is a production-stopper if you are inserting millions of small batches. Always batch inserts to at least 1,000–10,000 rows and never insert row-by-row from application code into a MergeTree table.

Cost Model

Cost is where the two systems diverge most dramatically at scale. BigQuery on-demand pricing is $6.25 per TB of data scanned. A single aggregation query over an unpartitioned 10 TB events table costs $62.50 — and if that query runs 500 times per month from a BI dashboard, the monthly bill for that one query is $31,250. BigQuery's partitioning and clustering features are not optional niceties; they are economic necessities. Partition pruning and clustering-based block skipping are the primary tools for keeping scan volumes (and costs) under control.

ClickHouse's cost model is infrastructure cost: compute (CPU, RAM) and storage. On ClickHouse Cloud, pricing is compute-seconds consumed plus storage at $0.023/GB/month. On self-hosted ClickHouse on AWS or GCP, you pay for EC2/GCE instances and EBS/Persistent Disk. A three-node ClickHouse cluster on r6i.4xlarge instances (16 vCPU, 128 GB RAM each) costs approximately $1,100/month on AWS on-demand — and that cluster can handle hundreds of concurrent analytical queries over hundreds of billions of rows. There is no per-query scan charge.

The economic crossover point varies by query pattern, but a practical rule of thumb: if you are scanning more than 500 GB per month in BigQuery on-demand, price a ClickHouse cluster. At 5 TB/month scanned, ClickHouse infrastructure is almost certainly cheaper. At 50 TB/month, the cost difference is often 5–10x in favor of ClickHouse.

Five-Way Comparison Table

Criterion	ClickHouse	BigQuery	Snowflake	StarRocks	Redshift
Storage model	MergeTree columnar; local or S3-backed (ClickHouse Cloud)	Capacitor columnar; Google Colossus	Micro-partition columnar; S3/Azure/GCS	Columnar (primary key table) + row store hybrid	Columnar slices on S3 (RA3) or local (DC2)
Compute model	Always-on nodes; serverless on ClickHouse Cloud	Serverless slot allocation per query	Virtual warehouses (always-on, auto-suspend)	Always-on BE nodes	Always-on nodes or serverless (Redshift Serverless)
Query latency (hot, 1B rows)	50–500ms	3–15s (inc. slot start)	1–10s	50–300ms	2–20s
Real-time ingestion	Native Kafka engine, sub-second	Storage Write API, seconds of latency	Snowpipe Streaming, seconds	Native Kafka, sub-second	Kinesis/Kafka via connectors, seconds
SQL compatibility	ClickHouse dialect (non-ANSI in places)	Standard SQL (ANSI-compliant)	Standard SQL (ANSI-compliant)	MySQL-compatible SQL	PostgreSQL-based SQL
Cost model	Infrastructure or compute-seconds (Cloud)	$6.25/TB scanned (on-demand)	Credits per virtual warehouse hour	Infrastructure cost	Instance cost or per-RPU (Serverless)
Operational complexity	Medium (self-hosted) / Low (Cloud)	Very low (fully serverless)	Low (fully managed)	Medium	Low–Medium
Best fit	Real-time operational analytics, observability, product analytics	Ad-hoc BI, data science, ML workloads	Enterprise data warehousing, governed BI	HTAP, real-time dashboards	Existing AWS ecosystem, BI at scale

When to Choose Each

Choose ClickHouse when:

Your use case requires sub-second query latency for end-user-facing analytics (product dashboards, usage meters, observability platforms).
You are ingesting high-velocity event streams from Kafka and need data queryable within seconds of production.
Your monthly BigQuery scan volume has crossed the point where infrastructure costs less than on-demand billing — typically above 1–2 TB/month of regular queries.
You have engineering capacity to own cluster operations, or you are willing to use ClickHouse Cloud to offload that responsibility.
Your data model is append-only or insert-heavy with rare updates (ClickHouse's ReplacingMergeTree handles upserts but with caveats).

Choose BigQuery when:

Your team needs to be productive on day one without infrastructure expertise — data scientists, analysts, and engineers can all write Standard SQL against BigQuery without learning a new system.
Your query pattern is ad-hoc and unpredictable — BI exploration, one-off analyses, ML training data preparation — where serverless auto-scaling is genuinely valuable.
You are already deep in Google Cloud (Vertex AI, Dataflow, Pub/Sub, Looker) and want native integrations without connectors.
Your data has complex nested structures (JSON, Protocol Buffers, Avro) that benefit from BigQuery's native STRUCT and ARRAY support.
Query latency above 2–5 seconds is acceptable for your users (batch reporting, overnight pipelines, data science notebooks).

Tip

ClickHouse and BigQuery are not mutually exclusive. Many production architectures use ClickHouse for real-time operational metrics (sub-second dashboards) and BigQuery for historical ad-hoc analysis and ML pipelines — with Airbyte or Fivetran syncing enriched data between them. Let the latency requirement drive the routing decision.

Key Takeaways

Key Takeaways

ClickHouse's MergeTree engine delivers sub-second analytical query latency on hot data; BigQuery's serverless slot model introduces 1–5 seconds of unavoidable cold-start overhead per query.
BigQuery's $6.25/TB on-demand pricing scales linearly with data scanned — partition pruning and clustering are economic requirements, not optional performance optimizations.
ClickHouse's native Kafka engine enables true real-time ingestion with sub-second data freshness; BigQuery's streaming path adds seconds of latency and a per-byte insert cost.
BigQuery's Standard SQL and zero-infrastructure model make it the right choice for ad-hoc analytics teams; ClickHouse's custom SQL dialect and operational requirements raise the engineering bar but deliver materially better interactive performance.
Evaluate the economic crossover point early: at meaningful query volumes (multi-TB/month), ClickHouse infrastructure almost always costs less than BigQuery on-demand; the break-even is usually reached faster than expected.
Hybrid architectures — ClickHouse for real-time, BigQuery for historical and ML — are common in production and often outperform either system used exclusively.

Working with JusDB on Analytics Databases

JusDB manages ClickHouse, BigQuery, and hybrid analytical architectures for engineering teams who need production-grade analytics without the operational overhead. Whether you are choosing a stack for the first time, migrating from BigQuery to ClickHouse to control costs, or building a real-time analytics pipeline on top of Kafka and ClickHouse, our DBAs handle architecture design, cluster provisioning, sharding configuration, query optimization, and 24/7 incident response — so your team ships features instead of debugging merge pressure or diagnosing slot quota issues.

We have helped teams reduce their BigQuery bills by 70–85% by migrating hot query workloads to ClickHouse, and we have helped teams avoid ClickHouse pitfalls (MergeTree engine selection, sharding key design, insert batching requirements) that would otherwise surface as production incidents months after initial deployment.

Explore JusDB Database Services → | Talk to a DBA

Related reading:

MySQL JSON Functions: A Complete Guide to JSON_EXTRACT, JSON_SET, and Multi-Valued Indexes

Background

Architecture Differences

ClickHouse: MergeTree and the Column Store

BigQuery: Serverless Slots and Capacitor Storage

Feature Comparison

SQL Dialect

Real-Time Ingestion

Integrations

Performance

Cost Model

Five-Way Comparison Table

When to Choose Each

Choose ClickHouse when:

Choose BigQuery when:

Key Takeaways

Working with JusDB on Analytics Databases

Share this article

Need Expert Help?

MySQL Performance Tuning

MySQL Consulting

MySQL Migration

MySQL High Availability

MySQL on Kubernetes

MySQL Support

Related Articles

MySQL Explained (2026): InnoDB, 8.4 LTS, Replication & Production Patterns

MySQL "Communications Link Failure": Fix wait_timeout, HikariCP & All 8 Timeout Variables

MySQL binlog Retention, Rotation & Purge: Production Guide (2026)