Database Performance

Save Space on MySQL Data with Column Compression: InnoDB Page Compression and ROW_FORMAT

MySQL InnoDB supports multiple compression strategies — from ROW_FORMAT=COMPRESSED to transparent page compression using punch holes. Choosing the right approach can cut storage by 40-70% for text-heavy workloads.

JusDB Team
October 6, 2022
8 min read
142 views

Cloud storage bills have a way of sneaking up on engineering teams — a 200 GB MySQL database becomes 500 GB after two years of growth, and suddenly you are paying $60/month just for storage on a single RDS instance before accounting for replicas, backups, and snapshots. A 500 GB InnoDB table storing structured log records, user-generated content, or JSON payloads is often reducible to 180–220 GB with the right compression strategy — a 55–65% reduction that directly lowers your storage bill and can also improve I/O throughput by fitting more data in the buffer pool. MySQL InnoDB supports three distinct compression approaches with meaningfully different trade-offs: the legacy ROW_FORMAT=COMPRESSED page format, transparent page compression using filesystem punch holes, and application-level compression using the COMPRESS() function. Understanding which to use — and when not to use compression at all — is what separates a thoughtful storage optimization from one that quietly degrades write throughput.

TL;DR
  • InnoDB ROW_FORMAT=COMPRESSED compresses 16 KB pages to a configurable smaller size (typically 8 KB) using zlib — good compression ratios but adds CPU overhead on every page read and write, and requires innodb_file_per_table=ON.
  • Transparent page compression (PAGE_COMPRESSED=1) compresses pages in memory then punches holes in the file on disk — requires a filesystem that supports sparse files (ext4, XFS, Btrfs, or NTFS) and is generally faster than ROW_FORMAT=COMPRESSED for read-heavy workloads.
  • The COMPRESS() function compresses individual column values at the application or SQL layer — flexible but opaque to the query planner, making indexed lookups on compressed values impossible.
  • Text-heavy workloads (JSON payloads, HTML content, logs, descriptions) typically achieve 50–70% compression; numeric-heavy or already-compressed data (images, BLOBs) compresses poorly.
  • Compression always trades CPU cycles for reduced I/O — measure both before and after on production-representative data.

MySQL Compression Options Overview

InnoDB does not offer a single compression toggle. Instead, it provides compression at different layers of the storage stack, each with different scope, compatibility requirements, and performance characteristics. The choice depends on your MySQL version, filesystem, cloud provider, and whether the bottleneck you are solving is storage cost, I/O throughput, or buffer pool efficiency.

Before evaluating specific methods, it helps to understand what MySQL is actually compressing. InnoDB stores data in 16 KB pages (configurable via innodb_page_size, set at initialization). Each page holds multiple rows from a single table or index. Compression works at the page level for the InnoDB methods — entire pages are compressed — not at the row or column level individually. This means compression effectiveness depends heavily on how much entropy (randomness) and redundancy exists within a 16 KB chunk of data. A page full of repetitive text compresses dramatically. A page full of UUIDs or encrypted values compresses almost not at all.

Key Questions Before Choosing a Method

  • What is the data type composition of the table — mostly text, mostly numeric, mixed?
  • What is the read-to-write ratio? High read workloads tolerate decompression overhead more easily than high write workloads.
  • What filesystem is MySQL running on? Transparent page compression requires sparse file support.
  • Is the table accessed via index range scans or mostly full table scans? Compression interacts differently with each access pattern.
  • What MySQL version is in use? Transparent page compression requires MySQL 5.7.8+ and the appropriate storage engine settings.

ROW_FORMAT=COMPRESSED (Legacy Page Compression)

ROW_FORMAT=COMPRESSED is InnoDB's original compression mechanism, available since MySQL 5.5. It operates by dividing each 16 KB InnoDB page into a compressed page of a smaller configured size (KEY_BLOCK_SIZE, typically 4 or 8 KB) stored on disk. When a page is read from disk, InnoDB decompresses it into the buffer pool at its full 16 KB size. When a page is modified and must be written back to disk, it is compressed again. This means every write involves a compression pass, and every cold read involves a decompression pass — CPU costs that are always paid regardless of cache hit rate.

Creating a Compressed Table

text
-- Enable per-table tablespaces (required for ROW_FORMAT=COMPRESSED)
SET GLOBAL innodb_file_per_table = ON;

-- Create a new table with compression
CREATE TABLE user_events (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  user_id     BIGINT UNSIGNED NOT NULL,
  event_type  VARCHAR(64)     NOT NULL,
  payload     JSON,
  created_at  DATETIME        NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  INDEX idx_user_created (user_id, created_at)
) ENGINE=InnoDB
  ROW_FORMAT=COMPRESSED
  KEY_BLOCK_SIZE=8;
text
-- Convert an existing table to compressed format (requires table rebuild)
ALTER TABLE user_events
  ROW_FORMAT=COMPRESSED
  KEY_BLOCK_SIZE=8;
-- Note: This locks the table in MySQL 5.7 without pt-online-schema-change
-- In MySQL 8.0, use ALGORITHM=INPLACE if supported, or pt-osc / gh-ost for large tables

KEY_BLOCK_SIZE Selection

The KEY_BLOCK_SIZE value controls the target compressed page size. Valid values are 1, 2, 4, 8, and 16 KB — but only values smaller than innodb_page_size (default 16 KB) actually compress anything. Setting KEY_BLOCK_SIZE=8 tells InnoDB to target 8 KB compressed pages, expecting roughly 2:1 compression. If the compressed page does not fit within KEY_BLOCK_SIZE bytes, InnoDB splits the page and stores overflow, which can actually worsen performance compared to no compression at all. Always measure the actual compression ratio after enabling compression:

text
-- Check actual compression ratio via INFORMATION_SCHEMA
SELECT
  table_name,
  ROUND(data_length / 1024 / 1024, 1)              AS data_mb,
  ROUND(index_length / 1024 / 1024, 1)             AS index_mb,
  ROUND((data_length + index_length) / 1024 / 1024, 1) AS total_mb
FROM information_schema.tables
WHERE table_schema = 'your_database'
  AND table_name = 'user_events';

-- Monitor compression success and failure rates
SELECT
  table_name,
  compress_ops,
  compress_ops_ok,
  ROUND(compress_ops_ok / compress_ops * 100, 1)   AS compress_success_pct,
  uncompress_ops
FROM information_schema.innodb_cmp_per_index
WHERE database_name = 'your_database'
ORDER BY compress_ops DESC;

Compression Advisor: Finding the Right KEY_BLOCK_SIZE

MySQL 5.7+ includes a compression advisor in the INFORMATION_SCHEMA that tracks how often compressed pages fail to fit within the configured KEY_BLOCK_SIZE. A high failure rate means the data is not compressing well at the chosen block size and InnoDB is paying the split cost repeatedly. The general guideline: if compress_success_pct is below 95%, try a larger KEY_BLOCK_SIZE.

Warning: CPU Overhead on Write-Heavy Tables

ROW_FORMAT=COMPRESSED uses zlib compression on every dirty page write. For a table receiving more than 500 writes per second, this compression cost is measurable — typically 10–25% additional CPU utilization on the database server. On RDS and Aurora, where you are renting CPU-hours, this translates directly to additional cost or degraded performance for concurrent queries. Benchmark write throughput on a staging environment before enabling ROW_FORMAT=COMPRESSED on any write-heavy production table.

InnoDB Transparent Page Compression (Punch Holes)

Introduced in MySQL 5.7.8, transparent page compression is a fundamentally different approach. Instead of changing the on-disk page format, InnoDB compresses each 16 KB page in memory, writes the compressed bytes to disk, and then tells the filesystem to punch a hole — a sparse region — in the file where the trailing zeroed bytes would have been stored. The filesystem does not allocate physical storage for punched holes, so the file's logical size (as reported by ls -l) remains 16 KB per page, but the actual allocated blocks on disk are smaller. This is why transparent page compression is sometimes called "sparse file compression."

Filesystem Requirements

Transparent page compression requires a filesystem that supports fallocate() with FALLOC_FL_PUNCH_HOLE. On Linux, this means ext4, XFS, Btrfs, or ZFS with the appropriate mount options. On AWS RDS and Aurora, transparent page compression is supported on instances using gp3 or io1/io2 volumes. On older gp2 volumes or instance store, verify support before enabling. On macOS (APFS) and Windows NTFS with appropriate block size, it also works.

text
-- Enable transparent page compression on a new table
CREATE TABLE audit_log (
  id          BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  entity_type VARCHAR(64)     NOT NULL,
  entity_id   BIGINT UNSIGNED NOT NULL,
  action      VARCHAR(32)     NOT NULL,
  actor_id    BIGINT UNSIGNED,
  details     JSON,
  created_at  DATETIME(3)     NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  PRIMARY KEY (id),
  INDEX idx_entity (entity_type, entity_id, created_at),
  INDEX idx_actor (actor_id, created_at)
) ENGINE=InnoDB
  COMPRESSION='zlib';
  -- Also supports: COMPRESSION='lz4' (MySQL 8.0+, lower CPU, slightly less compression)
  -- Or:            COMPRESSION='none' to disable explicitly
text
-- Convert existing table to transparent page compression
ALTER TABLE audit_log COMPRESSION='zlib';
-- Note: ALTER TABLE only sets the flag. Pages are compressed as they are written.
-- To immediately compress all pages, follow with OPTIMIZE TABLE:
OPTIMIZE TABLE audit_log;
-- On large tables, use pt-online-schema-change to avoid locking:
-- pt-online-schema-change --alter "COMPRESSION='zlib'" D=your_db,t=audit_log --execute

Transparent Compression vs ROW_FORMAT=COMPRESSED

Transparent page compression has a key performance advantage over ROW_FORMAT=COMPRESSED: hot pages in the InnoDB buffer pool are stored in their uncompressed 16 KB form, and compression only occurs when dirty pages are flushed to disk. For read-heavy workloads where most data is served from the buffer pool, transparent compression adds minimal overhead — decompression only happens on a cache miss. By contrast, ROW_FORMAT=COMPRESSED keeps both compressed and uncompressed copies of hot pages in the buffer pool (the "compression pad"), consuming more buffer pool memory.

LZ4 compression (available in MySQL 8.0 via COMPRESSION='lz4') is significantly faster than zlib at both compression and decompression — typically 3–5x — with a modest reduction in compression ratio (roughly 10–15% less compression than zlib). For CPU-sensitive write workloads, LZ4 is almost always the right choice over zlib for transparent page compression.

Verifying Actual Space Savings on Disk

text
-- Logical file size (always shows uncompressed size for sparse files)
-- Use du to see actual allocated disk blocks:
-- du -sh /var/lib/mysql/your_database/audit_log.ibd

-- In MySQL 8.0, check via performance_schema:
SELECT
  file_name,
  ROUND(total_file_bytes / 1024 / 1024, 1)     AS logical_mb,
  ROUND(allocated_file_bytes / 1024 / 1024, 1) AS physical_mb,
  ROUND((1 - allocated_file_bytes / total_file_bytes) * 100, 1) AS space_savings_pct
FROM performance_schema.file_summary_by_instance
WHERE file_name LIKE '%audit_log%';
When Transparent Page Compression Works Best

Transparent page compression delivers the best results when: (1) the table contains large VARCHAR, TEXT, or JSON columns with repetitive or structured content; (2) the read-to-write ratio is above 5:1, so decompression cost on cache misses is infrequent; (3) the workload is I/O-bound rather than CPU-bound — compression reduces I/O at the cost of CPU, so it helps most when disk is the bottleneck; (4) you are on MySQL 8.0 and can use LZ4 to minimize the CPU trade-off.

Column-Level Compression with COMPRESS()

MySQL provides the COMPRESS() and UNCOMPRESS() functions for application-level compression of individual column values. This approach is fundamentally different from InnoDB page compression: individual column values are compressed using zlib and stored as binary blobs (VARBINARY or BLOB) in the table. The storage engine never knows it is storing compressed data — from InnoDB's perspective, these are just byte strings.

Using COMPRESS() in Practice

text
-- Store compressed content in a dedicated column
CREATE TABLE content_pages (
  id           INT UNSIGNED NOT NULL AUTO_INCREMENT,
  slug         VARCHAR(255) NOT NULL,
  title        VARCHAR(512),
  body_compressed MEDIUMBLOB,  -- stores COMPRESS(body_html)
  word_count   INT UNSIGNED,
  created_at   DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
  updated_at   DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  UNIQUE KEY uq_slug (slug)
) ENGINE=InnoDB;

-- Insert with compression
INSERT INTO content_pages (slug, title, body_compressed, word_count)
VALUES (
  'getting-started',
  'Getting Started Guide',
  COMPRESS('...full page HTML content here...'),
  1250
);

-- Read with decompression
SELECT
  id,
  slug,
  title,
  UNCOMPRESS(body_compressed)  AS body_html,
  word_count
FROM content_pages
WHERE slug = 'getting-started';
text
-- Check compression ratio for a sample of rows
SELECT
  id,
  slug,
  LENGTH(body_compressed)                          AS compressed_bytes,
  UNCOMPRESSED_LENGTH(body_compressed)             AS original_bytes,
  ROUND(
    (1 - LENGTH(body_compressed) / UNCOMPRESSED_LENGTH(body_compressed)) * 100,
    1
  )                                                AS compression_pct
FROM content_pages
ORDER BY RAND()
LIMIT 20;

Limitations of COMPRESS()

Column-level compression with COMPRESS() has significant constraints that make it unsuitable for many use cases. First, you cannot index compressed columns directly — a query filtering on the compressed content requires a full table scan and decompression of every row, making it appropriate only for content retrieval by primary key or a separate indexed column. Second, the query planner cannot use statistics about the compressed data for optimization. Third, the compression is always zlib with no algorithm choice. Fourth, the return type of UNCOMPRESS() is always VARBINARY, so application code must convert to string. Despite these limitations, COMPRESS() is useful for specific patterns: large HTML or text content stored and retrieved by primary key, archival tables where the content is write-once and read-rarely, or cases where you want to compress specific columns while leaving others uncompressed and indexable.

Comparison: MySQL Compression Methods

Method Compression Scope Algorithm CPU Overhead Indexing Buffer Pool Impact Filesystem Req. MySQL Version
ROW_FORMAT=COMPRESSED InnoDB page (16 KB) zlib High (every read & write) Full — indexes compressed too Stores 2 copies of hot pages None (innodb_file_per_table) 5.5+
Transparent Page Compression InnoDB page (16 KB) zlib or LZ4 Medium (flush time only) Full — indexes compressed too Stores 1 uncompressed copy Sparse file support (ext4, XFS) 5.7.8+
COMPRESS() function Individual column value zlib Low (explicit calls only) None on compressed data Normal (treated as BLOB) None 4.1+
No compression (baseline) N/A None None Full Normal None All

Measuring Compression Effectiveness

Before enabling compression in production, run a controlled measurement on a sample of your actual data. The compression ratio varies enormously by data type: structured HTML or JSON compresses 60–75%, repetitive log lines compress 70–80%, UUID or hash columns compress under 10%, and already-compressed binary data (JPEG, PNG, encrypted bytes) may actually expand slightly after a compression attempt.

Estimating Compression Ratio Before Migration

text
-- Sample actual row content to estimate compression ratio
-- Run this on a representative sample before altering the full table
SELECT
  COUNT(*)                                              AS sampled_rows,
  ROUND(AVG(LENGTH(payload)), 0)                        AS avg_payload_bytes,
  ROUND(AVG(LENGTH(COMPRESS(payload))), 0)              AS avg_compressed_bytes,
  ROUND(
    (1 - AVG(LENGTH(COMPRESS(payload))) / AVG(LENGTH(payload))) * 100,
    1
  )                                                     AS estimated_compression_pct
FROM user_events
WHERE created_at >= NOW() - INTERVAL 7 DAY
LIMIT 10000;
text
-- Full table size before and after (run before ALTER, record results)
SELECT
  table_schema,
  table_name,
  ROUND(data_length / 1024 / 1024 / 1024, 2)       AS data_gb,
  ROUND(index_length / 1024 / 1024 / 1024, 2)      AS index_gb,
  ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS total_gb,
  table_rows,
  ROUND(data_length / table_rows, 0)                AS avg_row_bytes
FROM information_schema.tables
WHERE table_schema = 'your_database'
  AND table_name = 'user_events';

Monitoring Compression in Production

text
-- Monitor InnoDB compression stats globally
SHOW STATUS LIKE 'Innodb_compress%';
-- Key metrics:
--   Innodb_compress_time          — total time spent compressing (microseconds)
--   Innodb_uncompress_time        — total time spent decompressing
--   Innodb_num_open_files         — watch for file handle pressure with many compressed tables

-- Per-table compression stats (requires innodb_cmp_per_index_enabled = ON)
SET GLOBAL innodb_cmp_per_index_enabled = ON;

SELECT
  database_name,
  table_name,
  index_name,
  compress_ops,
  compress_ops_ok,
  ROUND(compress_ops_ok / NULLIF(compress_ops, 0) * 100, 1) AS compress_success_pct,
  uncompress_ops
FROM information_schema.innodb_cmp_per_index
ORDER BY compress_ops DESC
LIMIT 20;

When NOT to Use Compression

Compression is not universally beneficial. There are specific scenarios where enabling InnoDB compression can hurt performance more than the storage savings justify, and recognizing these patterns early saves you from a painful rollback on a production table.

High-Write, Latency-Sensitive Tables

Any table that is the hot path of your application write flow — order inserts, session updates, real-time event ingestion — is a poor candidate for ROW_FORMAT=COMPRESSED. The per-write compression cost adds latency directly to your P95 and P99 write response times. If your application has an SLA of under 10 ms for writes, adding 2–5 ms of zlib compression per dirty page flush is unacceptable. Transparent page compression with LZ4 is a better option here since compression happens asynchronously during buffer pool flushing, but even that adds measurable overhead at very high write rates (above 10,000 dirty page flushes per second).

Mostly-Numeric Data

Tables composed primarily of integer, BIGINT, DECIMAL, FLOAT, or DOUBLE columns do not compress well. A 16 KB page of InnoDB data containing 200 rows of an order header table (order_id, customer_id, status_code, total_amount, timestamps) will typically achieve less than 20% compression — not enough to justify the overhead. Run the pre-migration estimation query above before committing to any schema change.

Already-Compressed Column Values

If your application stores JPEG images, audio data, encrypted ciphertext, or Gzip-compressed payloads in MySQL BLOBs, InnoDB page compression will provide near-zero benefit and may slightly increase CPU load for no storage gain. Compression is entropy reduction — data with high entropy cannot be compressed further. Use the COMPRESS() sampling query to confirm the ratio before enabling page compression on such tables.

When Buffer Pool Hit Rate Is Critical

ROW_FORMAT=COMPRESSED keeps both compressed (disk-size) and uncompressed (16 KB) copies of hot pages in the buffer pool — effectively consuming more buffer pool memory per hot page than an uncompressed table would. If your workload has a very high buffer pool hit rate that you depend on for performance, and the buffer pool is already near capacity, adding compressed tables can evict hot uncompressed pages and degrade overall performance. Transparent page compression does not have this problem since it stores only the uncompressed copy in the buffer pool.

Warning: Test Compression on Staging First

Never enable table compression on a large production table without first: (1) measuring the compression ratio on sampled data; (2) benchmarking write throughput and latency on a staging clone; (3) verifying your filesystem supports sparse files if using transparent page compression; (4) planning the schema change execution with pt-online-schema-change or gh-ost to avoid locking. A compression change on a 500 GB table using a plain ALTER TABLE in MySQL 5.7 will lock the table for hours and likely cause an outage.

Key Takeaways
  • Use ROW_FORMAT=COMPRESSED with KEY_BLOCK_SIZE=8 for cold or read-heavy tables with text-heavy content — it is the most broadly supported option but carries the highest CPU overhead on writes.
  • Prefer transparent page compression (COMPRESSION='lz4' in MySQL 8.0) for tables with mixed read/write workloads — compression happens at flush time, not on every write, and LZ4 is substantially faster than zlib.
  • Reserve COMPRESS() function for large content blobs retrieved by primary key where indexing compressed data is not required — it is the most targeted option but completely opaque to the query planner.
  • Always sample real data to estimate compression ratio before committing to a schema change — a 500 GB table that only achieves 10% compression is probably not worth the CPU trade-off.
  • Watch compress_ops_ok / compress_ops in information_schema.innodb_cmp_per_index after enabling compression — a ratio below 95% means KEY_BLOCK_SIZE is too small for your data and performance will suffer.
  • For write-heavy, latency-sensitive tables, compression is almost never the right first optimization — focus on indexing, query tuning, and buffer pool sizing before touching table format.
  • Use pt-online-schema-change or gh-ost for any ALTER TABLE that changes ROW_FORMAT or COMPRESSION on tables larger than 10 GB to avoid production locks.

Working with JusDB on MySQL Storage Optimization

JusDB helps engineering teams reduce MySQL storage costs through compression, table partitioning, and archival strategies. Our DBAs evaluate compression ratios, CPU overhead trade-offs, and storage tier optimization — saving thousands in cloud storage costs.

Explore JusDB MySQL Management →  |  Talk to a DBA

Related reading:

Share this article