Database Architecture

Vitess: MySQL Sharding for Kubernetes-Native Workloads

Deploy Vitess for horizontal MySQL sharding — VSchema design, vtgate routing, and migration from single-instance MySQL

JusDB Team
May 22, 2023
12 min read
144 views

When YouTube's MySQL infrastructure began buckling under billions of daily queries in 2010, the team didn't migrate away from MySQL — they built Vitess to scale it horizontally. Today, Vitess is a CNCF graduated project powering some of the largest MySQL deployments on the planet, including PlanetScale's managed offering. If you're running a MySQL-backed service that's hitting the ceiling of vertical scaling, Vitess is the sharding layer that keeps you on MySQL without rewriting your application stack. This post is a practitioner's guide to deploying Vitess on Kubernetes, designing your VSchema, and migrating from a single-instance MySQL setup.

TL;DR
  • Vitess is a connection-pooling, query-routing, and horizontal sharding layer built on top of MySQL — born at YouTube, now a CNCF graduated project.
  • Core components: vtgate (query router), vttablet (per-MySQL sidecar), vtctld (control plane), and a topology server (etcd or ZooKeeper).
  • VSchema defines how keyspaces, vindexes, and tables map to shards — get this right before you cut over.
  • Migration from single-instance MySQL uses MoveTables and Reshard workflows via vreplication, with zero-downtime cutovers.
  • Cross-shard queries work but are expensive — design your sharding key to minimize scatter-gather operations.

What Is Vitess?

Vitess is a database clustering system for horizontal scaling of MySQL. It wraps MySQL instances in a lightweight sidecar process (vttablet), routes queries through a smart proxy (vtgate), and manages cluster topology through a consistent key-value store. The result is a system where your application connects to Vitess exactly as it would connect to a single MySQL server — using the MySQL wire protocol — but queries are transparently distributed across dozens or hundreds of MySQL shards.

The YouTube origin story matters here: the engineering team needed to scale MySQL without abandoning it, because their entire application was built around relational semantics. Vitess preserved those semantics while adding connection pooling, query rewriting, and horizontal sharding. In 2018, Vitess graduated as a CNCF project, and PlanetScale was founded to offer it as a managed service. The open-source project and the managed offering have cross-pollinated significantly since then.

Vitess versus Galera Cluster is a question that comes up often. Galera provides synchronous multi-primary replication — every write is committed to all nodes before returning success. This gives you high availability with no replica lag, but it caps horizontal write scalability because the synchronous certification protocol creates contention. Vitess takes a different approach: it shards your data so that writes to different shards are completely independent. You get linear write scalability at the cost of giving up cross-shard ACID transactions. For most high-traffic web workloads, sharding wins. For workloads with heavy cross-entity transactions, Galera (or a distributed SQL database) is worth serious consideration.

Architecture Deep Dive

Understanding Vitess requires understanding its four core components and how they compose into a cluster.

vtgate is the stateless query router that your application connects to. It accepts MySQL protocol connections, parses SQL, inspects the VSchema, and routes queries to the correct shard or shards. It also handles connection pooling — a vtgate instance can multiplex thousands of application connections into a much smaller pool of connections to vttablets. This is one of Vitess's most immediate wins even before you shard: connection pooling at the proxy layer prevents the MySQL max_connections wall from destroying your application during traffic spikes.

vttablet runs as a sidecar alongside each MySQL instance. It manages the MySQL process, handles tablet-level connection pooling, enforces query timeouts and row limits, collects metrics, and participates in replication management. Every MySQL instance in your cluster — primary and replicas — has its own vttablet.

vtctld is the control plane server that exposes a web UI and API for cluster management operations. It communicates with the topology server to read and write cluster state, and it orchestrates workflows like reparenting, resharding, and schema changes.

The topology server is a consistent key-value store — typically etcd in Kubernetes environments, or ZooKeeper in legacy deployments — that stores cluster metadata: which tablets exist, which is the primary for each shard, what the VSchema looks like, and replication graph information. This is the source of truth that all Vitess components consult.

Warning

The topology server is a critical dependency. If etcd becomes unavailable, vtgate can continue serving queries from cached routing information, but vtctld operations and failovers will be blocked. Run etcd with at least three nodes and treat it with the same care you'd give your primary MySQL.

VSchema: The Heart of Your Sharding Design

The VSchema (Vitess Schema) is a JSON document that tells vtgate how to route queries. It defines keyspaces, tables, vindexes (sharding functions), and the mapping between them. Getting VSchema design right is the most important architectural decision you'll make — it determines which queries are fast single-shard lookups and which are expensive scatter-gather operations.

A minimal VSchema for a sharded keyspace looks like this:

text
{
  "sharded": true,
  "vindexes": {
    "hash": {
      "type": "hash"
    },
    "lookup_unique": {
      "type": "consistent_lookup_unique",
      "params": {
        "table": "user_lookup",
        "from": "email",
        "to": "user_id"
      },
      "owner": "users"
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        },
        {
          "column": "email",
          "name": "lookup_unique"
        }
      ]
    },
    "orders": {
      "column_vindexes": [
        {
          "column": "user_id",
          "name": "hash"
        }
      ]
    },
    "products": {
      "type": "reference"
    }
  }
}

Breaking this down: the hash vindex uses a consistent hashing function on user_id to determine which shard a row lives on. The consistent_lookup_unique vindex on email maintains a separate lookup table that maps email addresses to user_id values, enabling single-shard lookups by email without scatter-gather. The orders table is co-sharded with users on user_id, which means queries joining users and orders for a single user can be served from a single shard. The products table is marked as a reference table — Vitess replicates it to every shard so it's always available locally.

Tip

Co-shard entities that are always queried together. If your application constantly joins users and orders, use user_id as the sharding key for both tables. This eliminates cross-shard joins for the most common query patterns and keeps your most frequent operations fast.

Sharding Strategies

Vitess supports several sharding approaches through its vindex system.

Hash-based sharding (the hash vindex) applies a hashing function to the sharding key and maps the result to a keyspace ID, then to a shard. This distributes data evenly across shards but makes range queries expensive — a range scan on user_id will scatter across all shards because the hash destroys ordering. Use this when your primary access pattern is point lookups by ID.

Range-based sharding assigns contiguous keyspace ID ranges to each shard. This preserves ordering and makes range scans efficient, but can lead to hotspots if your IDs are monotonically increasing (all inserts hit the last shard). UUID v4 or randomized IDs pair well with range-based sharding.

Lookup vindexes solve the problem of querying by non-sharding-key columns. A consistent_lookup_unique vindex maintains a backing table that maps an alternate key (like email) to the sharding key (like user_id). When vtgate sees WHERE email = ?, it first queries the lookup table to find the user_id, then routes to the correct shard. This adds a round-trip but avoids a full scatter.

Cross-shard queries are unavoidable in some workloads. When vtgate cannot determine a single target shard from the query and VSchema, it fans the query out to all shards in parallel (a scatter-gather operation) and merges the results. This works correctly but the latency is proportional to the slowest shard, and the load is proportional to the number of shards. Monitor your scatter query rate — a high scatter rate is a sign that your VSchema or sharding key needs rethinking.

Kubernetes Deployment

Vitess is designed for Kubernetes. The recommended deployment pattern uses the Vitess Operator, which manages VitessCluster, VitessKeyspace, VitessShard, and VitessCell custom resources.

A minimal cluster definition for a two-shard deployment:

text
apiVersion: planetscale.com/v2
kind: VitessCluster
metadata:
  name: example
spec:
  cells:
    - name: zone1
      gateway:
        replicas: 2
        resources:
          requests:
            cpu: "1"
            memory: 256Mi
  keyspaces:
    - name: commerce
      turndownPolicy: Immediate
      partitionings:
        - equal:
            parts: 2
            shardTemplate:
              databaseInitScriptSecret:
                name: example-cluster-config
                key: init_db.sql
              tabletPools:
                - cell: zone1
                  type: replica
                  replicas: 2
                  mysqld:
                    resources:
                      requests:
                        cpu: "1"
                        memory: 1Gi
                  dataVolumeClaimTemplate:
                    accessModes: [ReadWriteOnce]
                    resources:
                      requests:
                        storage: 10Gi

The operator handles pod scheduling, tablet registration with the topology server, and primary election. vtgate pods are deployed as a Deployment (stateless), while vttablet pods are managed through StatefulSets to preserve stable network identities and persistent volume bindings.

Tip

Configure vtgate's connection pool size carefully. The --queryserver-config-pool-size flag on vttablet controls how many MySQL connections each tablet maintains. Multiply this by the number of tablets to understand your total MySQL connection count. Start conservatively — 16 to 32 connections per tablet is enough for most workloads — and tune upward based on observed query latency under load.

Migrating from Single-Instance MySQL

Vitess's vreplication engine makes it possible to migrate from a single MySQL instance to a sharded Vitess cluster with minimal downtime. The migration happens in stages using two workflow commands: MoveTables and Reshard.

Stage 1: Stand up an unsharded Vitess cluster. Your first step is to deploy Vitess in front of your existing MySQL without sharding. Create a single-shard keyspace with your existing MySQL as the backing instance. This lets you validate that your application works correctly through vtgate before introducing sharding complexity. You immediately gain connection pooling and Vitess's query management features.

text
# Apply VSchema for unsharded keyspace
vtctldclient ApplyVSchema \
  --vschema='{"sharded": false, "tables": {"users": {}, "orders": {}}}' \
  commerce

# Verify routing
vtctldclient GetVSchema commerce

Stage 2: Move tables with MoveTables. Use MoveTables to migrate tables from a source keyspace to a target keyspace while keeping both in sync via vreplication. vreplication performs an initial copy of all existing rows, then tails the MySQL binary log to apply ongoing changes. The source remains writable during this process.

text
# Start MoveTables workflow
vtctldclient MoveTables \
  --workflow=commerce_to_sharded \
  --source-keyspace=commerce \
  --target-keyspace=commerce_sharded \
  create

# Monitor copy progress
vtctldclient MoveTables \
  --workflow=commerce_to_sharded \
  --target-keyspace=commerce_sharded \
  show

# When replication is caught up, verify with VDiff
vtctldclient VDiff \
  --workflow=commerce_to_sharded \
  --target-keyspace=commerce_sharded \
  create

Stage 3: Cut over. Once vreplication is caught up and VDiff confirms data consistency, initiate the cutover. Vitess atomically switches reads and writes to the new target, briefly blocking writes to flush in-flight transactions.

text
vtctldclient MoveTables \
  --workflow=commerce_to_sharded \
  --target-keyspace=commerce_sharded \
  switchtraffic

vtctldclient MoveTables \
  --workflow=commerce_to_sharded \
  --target-keyspace=commerce_sharded \
  complete

Stage 4: Reshard for horizontal scale. Once you're on a sharded Vitess keyspace, adding more shards uses the Reshard workflow. This works identically to MoveTables but splits or merges shard ranges rather than moving tables between keyspaces. The same vreplication mechanism — initial copy followed by binlog tailing — keeps source and target in sync until you're ready to cut over.

Warning

VDiff can be slow on large tables — it performs a full table scan on both source and target. On tables with hundreds of millions of rows, schedule the VDiff run during off-peak hours and monitor it with VDiff show. Do not skip this step: an undetected data divergence before cutover is significantly harder to remediate than delaying the migration.

Key Takeaways

Key Takeaways
  • Vitess wraps standard MySQL with connection pooling, query routing, and horizontal sharding — your application connects via the standard MySQL protocol and needs minimal changes.
  • The four core components (vtgate, vttablet, vtctld, topology server) each have a distinct role; understanding their failure modes is essential for operating a production cluster.
  • VSchema design is your most critical architectural decision. Co-shard frequently joined entities, use lookup vindexes for alternate key access patterns, and use reference tables for small read-heavy lookup datasets.
  • Cross-shard queries work but are expensive. Track your scatter query rate as a key operational metric and use it to validate your VSchema design.
  • Migration from single-instance MySQL is a staged process: unsharded Vitess first, then MoveTables, then Reshard — with VDiff validation before each cutover.
  • Vitess beats Galera for horizontal write scalability; Galera beats Vitess for cross-entity ACID transactions. Choose based on your dominant workload pattern.
  • If you need managed Vitess without operating the control plane yourself, PlanetScale is the reference implementation by the original Vitess creators.

Scale MySQL Without Leaving It Behind

Vitess solves the problem that sends many teams away from MySQL entirely: the ceiling on vertical scaling. By adding a sharding layer that speaks the MySQL protocol, you keep your existing tooling, your team's SQL expertise, and your application code largely intact — while gaining the horizontal scalability of a distributed system.

The operational complexity is real. Running etcd, managing vtctld, designing VSchema, and orchestrating vreplication workflows is meaningfully more work than running a single MySQL instance. But for teams whose MySQL infrastructure is genuinely at the limit of vertical scaling, the alternative is a full migration to a different database — with all the risk and cost that implies.

At JusDB, we track the operational patterns, performance benchmarks, and architectural decisions that matter for teams scaling relational databases in production. Whether you're evaluating Vitess, comparing it against alternatives like CockroachDB or PlanetScale's managed offering, or working through a migration plan, our database comparison and benchmarking tools give you the data you need to make an informed decision without running the experiments yourself.

Explore the Vitess profile on JusDB for benchmarks, operator experience ratings, and comparisons against MySQL Galera, CockroachDB, and other horizontally scalable SQL systems.

Share this article