Database Engineering

UUIDs as Primary Keys: How We Destroyed Database Performance

UUIDs as primary keys can destroy database performance. Learn why 50M rows went from 0.5ms to 2.3ms inserts, and how to fix it with hybrid approaches.

JusDB Team
June 6, 2023
10 min read
340 views

In the world of distributed systems and microservices architecture, UUIDs (Universally Unique Identifiers) have become the go-to solution for generating unique primary keys across multiple services. The appeal is obvious: generate globally unique identifiers without coordinating with a central database, simplify data merging across regions, and avoid exposing sequential business intelligence through your APIs.

But here's the uncomfortable truth we've witnessed across dozens of client deployments: UUIDs as primary keys can silently devastate your database performance, especially at scale.

The Allure of UUIDs

Before diving into the problems, let's understand why UUIDs became so popular:

Global Uniqueness: Generate IDs across distributed services without database round trips. Perfect for microservices architectures where multiple applications need to create records independently.

Data Merging Simplified: Multi-region deployments become easier when merging data from different databases doesn't require ID conflict resolution.

Security Through Obscurity: Sequential IDs like /users/12345 expose business metrics. UUIDs obscure this information.

These benefits are real, which makes the performance trade-offs easy to overlook—until your database starts grinding to a halt.

The Performance Catastrophe: Real Numbers

Let me share some eye-opening metrics from production systems we've analyzed:

Storage Overhead: The 4x to 20x Problem

A standard auto-incrementing integer (INT or BIGINT) consumes 32-64 bits per value. A UUID, even in optimized binary format (BINARY(16)), consumes 128 bits—4x larger. But most implementations use human-readable CHAR(36) format, consuming up to 288 bits, roughly 9x larger than a BIGINT.

In one client's 50-million-row table:

  • Integer primary key: ~240 MB index size

  • UUID (CHAR(36)) primary key: ~1.7 GB index size

That's a 7x increase in storage just for the primary key index. And remember, secondary indexes in InnoDB store the primary key value as a pointer, multiplying this overhead across all indexes.

The B+ Tree Balancing Nightmare

Here's where things get truly painful. MySQL and PostgreSQL store indexes in B+ tree structures, optimized for sequential insertions. When you insert sequential integers (1, 2, 3, 4...), the database simply appends new values to the rightmost leaf page of the tree. InnoDB fills pages to about 94% capacity before creating a new page.

With random UUIDs, chaos ensues:

  1. Random Insertion Points: A UUID like 9b9b9b9b-9b9b-9b9b-9b9b-9b9b9b9b9b9b might need to be inserted between existing values 550e8400-e29b-41d4-a716-446655440000 and a9a9a9a9-a9a9-a9a9-a9a9-a9a9a9a9a9a9

  2. Page Splitting: The database must split existing pages to make room, then rebalance the entire tree structure

  3. Poor Page Utilization: Instead of 94% utilization, random UUIDs result in 50-70% page utilization, nearly doubling storage requirements

  4. Constant Reorganization: Every insert potentially triggers rebalancing across multiple levels of the tree

On a high-volume OLTP system processing 10,000 inserts per second, this constant rebalancing creates significant I/O overhead and lock contention.

Real-World Performance Degradation

A recent client case study:

Before (Auto-increment INT):

  • INSERT operations: ~0.5ms average

  • SELECT on indexed column: ~0.1ms

  • Buffer pool hit ratio: 99.8%

After (UUID v4 as CHAR(36)):

  • INSERT operations: ~2.3ms average (460% slower)

  • SELECT on indexed column: ~0.8ms (700% slower)

  • Buffer pool hit ratio: 92.1% (7.7% more disk I/O)

  • Storage costs increased by 340%

Why UUIDs Kill Cache Efficiency

Modern databases rely heavily on caching frequently accessed pages in memory (InnoDB buffer pool, PostgreSQL shared buffers). With sequential keys, recently inserted rows cluster together in the same index pages. A query for recent records hits just a few cached pages.

With random UUIDs, temporal locality vanishes:

  • Records inserted at the same time scatter across the entire index

  • The entire index must stay in cache for good performance

  • Cache miss rates skyrocket as the table grows

  • More disk I/O means slower queries and higher latency

As one of our clients discovered, what worked fine with 1 million rows became unusably slow at 50 million rows—not because of scale, but because their working set exceeded available memory.

The String Comparison Tax

Beyond storage and indexing issues, UUIDs introduce computational overhead:

String operations are slower than integer operations, especially for:

  • WHERE clause comparisons

  • JOIN operations on foreign keys

  • Index lookups

  • Sorting and grouping

A query like:

WHERE id = '550e8400-e29b-41d4-a716-446655440000'

is significantly slower than:

WHERE id = 12345

At scale, across millions of queries per day, these microseconds compound into real performance degradation.

The JOIN Performance Penalty

In relational databases, foreign keys reference primary keys. When you use UUIDs as primary keys, your foreign keys also become UUIDs, propagating the performance problems:

SELECT o.order_id, c.customer_name 
FROM orders o 
JOIN customers c ON o.customer_id = c.customer_id 
WHERE o.order_date = '2024-01-01';

With UUID foreign keys, this join:

  • Scans larger indexes (4-9x more data)

  • Performs slower string comparisons

  • Requires more buffer pool memory

  • Generates more disk I/O

Multiply this across complex queries with multiple joins, and performance degrades rapidly.

Choosing the Right Integer Type: BIGINT vs BIGINT UNSIGNED

Before diving into solutions, an important note about integer primary keys: always use UNSIGNED for auto-incrementing primary keys.

Why UNSIGNED?

INT vs INT UNSIGNED:

  • INT: -2,147,483,648 to 2,147,483,647 (~2.1 billion positive values)

  • INT UNSIGNED: 0 to 4,294,967,295 (~4.3 billion values)

BIGINT vs BIGINT UNSIGNED:

  • BIGINT: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

  • BIGINT UNSIGNED: 0 to 18,446,744,073,709,551,615 (18+ quintillion values)

Since auto-increment values are never negative, using UNSIGNED effectively doubles your capacity without any storage overhead—both use the same 8 bytes for BIGINT.

INT UNSIGNED vs BIGINT UNSIGNED: The Modern Choice

Historically, developers used INT (4 bytes) to save space over BIGINT (8 bytes). While this 50% reduction can save 10-20% on index sizes in heavily normalized databases, modern best practices favor BIGINT UNSIGNED:

Why BIGINT UNSIGNED is now standard:

  1. Future-proofing: 4.3 billion rows might seem like a lot, but high-volume systems can hit this limit

  2. No migration headaches: Migrating from INT to BIGINT on a billion-row table is painful

  3. Storage is cheap: The performance difference is negligible on modern hardware

  4. Industry standard: Rails 5.1+ (2017) made BIGINT the default for all primary keys

When to use INT UNSIGNED:

  • Tables guaranteed to never exceed 4.3 billion rows

  • Heavily normalized schemas with dozens of foreign keys where index size truly matters

  • Legacy applications where schema changes are prohibitively expensive

Our recommendation: Default to BIGINT UNSIGNED for all new projects unless you have specific constraints.

The Right Solution: Hybrid Approaches

The good news? You don't have to choose between performance and the benefits of UUIDs. Here are battle-tested approaches we recommend to clients:

CREATE TABLE users (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    uuid BINARY(16) UNIQUE NOT NULL DEFAULT (UUID_TO_BIN(UUID())),
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_uuid (uuid)
) ENGINE=InnoDB;

Benefits:

  • Fast inserts and queries using integer primary key

  • Global uniqueness via UUID for API exposure and data merging

  • 18+ quintillion possible IDs (you'll never run out)

  • Best of both worlds with minimal overhead

Usage pattern:

  • Internal queries use id

  • External API uses uuid

  • Data sync between regions uses uuid

Strategy 2: Ordered UUIDs (UUIDv7)

UUIDv7 incorporates timestamps in the most significant bits, providing natural ordering:

-- Using UUIDv7 (timestamp-based, ordered)
CREATE TABLE events (
    id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(gen_uuid_v7())),
    event_type VARCHAR(50),
    payload JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

Performance characteristics:

  • Much better than random UUIDs (v4)

  • Maintains sequential-like insertion patterns

  • Reduces page splitting and fragmentation

  • Still larger than integers (128 vs 64 bits)

Note: MySQL native UUIDv7 support is coming in future versions. Meanwhile, use user-defined functions or application-level generation.

Strategy 3: Composite Keys for Distributed Systems

For truly distributed scenarios:

CREATE TABLE distributed_events (
    shard_id SMALLINT UNSIGNED NOT NULL,
    sequence_id BIGINT UNSIGNED AUTO_INCREMENT,
    uuid BINARY(16) UNIQUE NOT NULL,
    event_data JSON,
    PRIMARY KEY (shard_id, sequence_id)
) ENGINE=InnoDB;

This approach:

  • Maintains sequential ordering within shards

  • Scales horizontally across up to 65,535 shards

  • Provides UUIDs for cross-shard uniqueness

  • Optimizes for common query patterns

Migration Strategy: From UUID to Hybrid

If you're already using UUIDs, here's our recommended migration path:

Step 1: Add Integer Primary Key

ALTER TABLE users 
    ADD COLUMN id_new BIGINT UNSIGNED AUTO_INCREMENT UNIQUE,
    ADD INDEX idx_id_new (id_new);

Step 2: Rename Columns

ALTER TABLE users 
    DROP PRIMARY KEY,
    ADD PRIMARY KEY (id_new),
    CHANGE COLUMN uuid uuid_external BINARY(16),
    CHANGE COLUMN id_new id BIGINT UNSIGNED;

Step 3: Update Foreign Keys This requires careful planning and testing, potentially using dual-write patterns during transition.

Step 4: Update Application Layer

  • Internal queries use new integer id

  • External APIs continue using uuid_external

  • Implement gradual rollout with monitoring

Storage Optimization: BINARY(16) vs CHAR(36)

If you must use UUIDs, the storage format matters significantly:

The Storage Breakdown

CHAR(36):

  • 36 bytes with CHARSET latin1 or CHARSET ascii

  • 108-144 bytes with CHARSET utf8mb4 (default in MySQL 8.0+)

    • 2 bytes VARCHAR overhead

    • 36 characters × 3-4 bytes per character

  • Human-readable in queries and dumps

  • Simple to work with in application code

BINARY(16):

  • 16 bytes - the actual UUID size

  • 55% smaller than CHAR(36) with ASCII

  • 89% smaller than CHAR(36) with utf8mb4

  • Requires conversion functions (UUID_TO_BIN(), BIN_TO_UUID())

  • Not human-readable in raw queries

Performance Impact

The storage difference multiplies across:

  • Primary key index

  • All secondary indexes (which include the primary key in InnoDB)

  • Buffer pool memory utilization

  • Disk I/O for index operations

Example with 50 million rows:

  • CHAR(36) with utf8mb4: ~6.8 GB just for primary key index

  • BINARY(16): ~800 MB for primary key index

  • 8.5x difference in index size

Conversion Functions (MySQL 8.0+)

-- Create table with BINARY(16)
CREATE TABLE users (
    id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID())),
    name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;

-- Insert with conversion
INSERT INTO users (id, name) 
VALUES (UUID_TO_BIN('550e8400-e29b-41d4-a716-446655440000'), 'John Doe');

-- Query with conversion
SELECT BIN_TO_UUID(id) AS id, name FROM users;

-- Where clause requires conversion
SELECT * FROM users WHERE id = UUID_TO_BIN('550e8400-e29b-41d4-a716-446655440000');

The Trade-offs

Use BINARY(16) when:

  • Working with large tables (millions of rows)

  • Performance and storage efficiency are critical

  • Your application layer can handle conversion

  • You're comfortable with conversion overhead in debugging

Use CHAR(36) when:

  • Small tables (<1 million rows) where overhead is negligible

  • Development/debugging convenience is prioritized

  • Your ORM doesn't handle binary UUIDs well

  • Legacy systems where changing would break integrations

Recommendation

For production systems at scale: Use BINARY(16) with proper conversion functions. The 55-89% storage savings compounds across indexes and dramatically improves cache efficiency.

For small applications or rapid prototyping: CHAR(36) is acceptable if the table will remain small. But plan migration paths if the application grows.

Best of both worlds: Use our recommended hybrid approach (BIGINT UNSIGNED primary key + BINARY(16) UUID as secondary unique key) where the UUID can be BINARY for storage efficiency while the primary key remains simple and performant.

Monitoring and Benchmarking

Track these metrics when evaluating UUID performance impact:

Storage Metrics:

  • Index size growth rate

  • Page fill factor (should be >85%)

  • Tablespace utilization

Performance Metrics:

  • Average query response time (especially JOINs)

  • INSERT throughput and latency

  • Buffer pool hit ratio

  • Disk I/O rates

Monitoring queries:

-- Index fragmentation check
SELECT 
    table_name,
    index_name,
    ROUND(data_length/1024/1024, 2) AS data_mb,
    ROUND(index_length/1024/1024, 2) AS index_mb
FROM information_schema.tables 
WHERE table_schema = 'your_database'
ORDER BY index_length DESC;

-- Page fill monitoring (requires sys schema)
SELECT 
    object_schema,
    object_name,
    index_name,
    ROUND(avg_leaf_pages_per_segment, 2) AS avg_leaf_pages
FROM sys.x$schema_index_statistics
WHERE object_schema = 'your_database';

When UUIDs Make Sense

Despite the performance concerns, UUIDs are appropriate in specific scenarios:

Small to Medium Tables: Under 10 million rows, the performance impact is often negligible

Truly Distributed Writes: Systems with no central coordination point where multiple services must generate IDs independently

Append-Only Event Logs: When using UUIDv7 or similar ordered variants for event sourcing

Security-Critical Systems: Where exposing sequential IDs creates genuine security risks (combined with secondary UUID approach)

The Bottom Line

UUIDs as primary keys create a performance tax that compounds at scale. For most applications, especially those with:

  • High transaction volumes (>1000 inserts/sec)

  • Large datasets (>10 million rows)

  • Complex queries with multiple joins

  • Memory-constrained environments

The hybrid approach (auto-increment primary key + UUID secondary key) provides the best balance of performance and functionality.

JusDB's Recommendations

Based on our experience managing 100+ MySQL fleets in production:

  1. Default to BIGINT UNSIGNED for auto-increment primary keys—never plain BIGINT

  2. Add UUIDs as secondary unique keys when global uniqueness is required

  3. Use BINARY(16) for UUID storage in production systems at scale (use CHAR(36) only for small tables or development)

  4. Use UUIDv7 if you must use UUIDs as primary keys

  5. Monitor index fragmentation and page utilization metrics

  6. Benchmark before migrating existing systems

  7. Use INT UNSIGNED only when you're absolutely certain a table won't exceed 4.3 billion rows and need the space savings

Remember: premature optimization is the root of all evil, but so is premature pessimization. Choose your primary key strategy based on actual requirements, not assumptions.


Need help optimizing your database architecture? JusDB specializes in MySQL performance tuning, database migrations, and reliability engineering. Contact us at contact@jusdb.com for a free database health assessment.

Working with JusDB on Schema Performance

UUID primary key performance problems are one of the most common schema issues we see in production — and one of the easiest to design around upfront. We review primary key strategies as part of schema design work. If you're migrating from UUID to BIGINT or evaluating UUIDv7, reach out.

Related reading: Database Schema Design Fundamentals | MySQL Indexes Explained | Zero-Downtime Schema Migrations

Share this article