JSON in MySQL: Power and Pitfall
MySQL's JSON column type (introduced in 5.7) lets you store flexible, schema-less data alongside structured columns. But JSON queries are often slower than column queries, and many teams don't realize why — or what to do about it.
How JSON Storage Works
MySQL stores JSON in a binary format that allows direct path access without parsing the entire document. However, there's no built-in way to index arbitrary JSON paths directly — you must use functional indexes (MySQL 5.7.8+ for generated columns, MySQL 8.0 for functional indexes directly).
CREATE TABLE events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
payload JSON NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Query without index — full table scan
SELECT * FROM events WHERE payload->>'$.user_id' = '42';Functional Indexes for JSON Paths
-- MySQL 8.0: functional index directly on JSON path
ALTER TABLE events
ADD INDEX idx_user_id ((CAST(payload->>'$.user_id' AS UNSIGNED)));
-- Verify the index is used
EXPLAIN SELECT * FROM events
WHERE CAST(payload->>'$.user_id' AS UNSIGNED) = 42;
-- Alternative: generated column + index (works in 5.7)
ALTER TABLE events
ADD COLUMN user_id BIGINT GENERATED ALWAYS AS (payload->>'$.user_id') VIRTUAL,
ADD INDEX idx_user_id (user_id);JSON_TABLE: SQL Over JSON Arrays
JSON_TABLE converts JSON arrays into relational rows, enabling JOINs and aggregations:
SELECT
e.id,
jt.tag
FROM events e,
JSON_TABLE(
e.payload,
'$.tags[*]' COLUMNS (
tag VARCHAR(100) PATH '$'
)
) AS jt
WHERE jt.tag = 'critical';Performance Comparison
-- Benchmark: JSON vs dedicated column (10M rows)
-- JSON path query (no functional index):
-- SELECT * FROM events WHERE payload->>'$.user_id' = '42'
-- Result: 4,200ms (full table scan)
-- JSON path query (with functional index):
-- SELECT * FROM events WHERE CAST(payload->>'$.user_id' AS UNSIGNED) = 42
-- Result: 1.2ms (index range scan)
-- Dedicated integer column with regular index:
-- SELECT * FROM events WHERE user_id = 42
-- Result: 0.8ms (index range scan)
-- Takeaway: functional indexes close most of the gap,
-- but dedicated columns are slightly faster and cleanerWhen to Use JSON Columns
| Good fit for JSON | Bad fit for JSON |
|---|---|
| Truly variable structure (different keys per row) | Fields you query or filter frequently |
| Configuration/settings blobs | Fields used in JOINs |
| Audit logs with flexible metadata | Fields with frequent updates (rewrites entire JSON) |
| External API responses stored as-is | Fields that should be normalized for reporting |
Migrating JSON to Columns
-- Step 1: Add new columns
ALTER TABLE events
ADD COLUMN user_id BIGINT,
ADD COLUMN event_type VARCHAR(50);
-- Step 2: Backfill
UPDATE events
SET
user_id = CAST(payload->>'$.user_id' AS UNSIGNED),
event_type = payload->>'$.type'
WHERE user_id IS NULL
LIMIT 10000; -- Run in batches
-- Step 3: Add indexes
ALTER TABLE events
ADD INDEX idx_user_id (user_id),
ADD INDEX idx_event_type (event_type);
-- Step 4: Update queries, then optionally drop JSON columnJSON Update Performance
Updating a single field in a JSON column rewrites the entire JSON document. For frequently-updated fields, this is expensive:
-- Expensive: rewrites entire payload blob
UPDATE events
SET payload = JSON_SET(payload, '$.processed', true)
WHERE id = 12345;
-- Better: dedicated boolean column for frequently-updated state
ALTER TABLE events ADD COLUMN processed BOOLEAN DEFAULT FALSE;
UPDATE events SET processed = true WHERE id = 12345;Key Takeaways
- JSON columns require functional indexes for fast path queries
- Functional indexes close most of the performance gap vs dedicated columns
- Avoid JSON for frequently-updated fields or fields used in JOINs
- Use
JSON_TABLEto query JSON arrays relationally - Consider migrating hot JSON paths to dedicated columns as query patterns solidify