Database SRE

Database Capacity Planning: From Metrics to Growth Projections

Build a data-driven capacity planning process using historical metrics, growth trends, and resource modeling

JusDB Team
February 21, 2023
11 min read
148 views

At 2:47 AM, your on-call engineer gets paged because the production database ran out of disk space. The storage volume hit 100% utilization, writes are failing, and customers are seeing errors. You add more disk, the incident resolves, and everyone breathes again — but the post-mortem reveals the same uncomfortable truth: the data to predict this was always there. Nobody was watching it. Database capacity planning failures are almost never surprises to the infrastructure — they are surprises to the people responsible for it. The metrics told the story weeks in advance; the team just was not listening. Building a systematic, data-driven capacity planning process means you catch these inflection points before they become incidents, and you give your organization the lead time needed to provision resources without firefighting.

TL;DR
  • Collect CPU, IOPS, connection counts, and storage growth rate as your four foundational capacity metrics.
  • Use pg_stat_bgwriter and pg_stat_database in PostgreSQL (or SHOW STATUS in MySQL) to extract baseline growth data directly from the engine.
  • Model growth with both linear and exponential projections, and plan to headroom targets — aim to keep utilization below 70% of peak capacity.
  • Set alerting thresholds at 60% (warn) and 80% (critical) to give yourself actionable lead time before hitting the ceiling.
  • For RDS, CloudWatch metrics like FreeStorageSpace, DatabaseConnections, and ReadIOPS/WriteIOPS replace most manual polling.
  • A simple spreadsheet model updated weekly is often more actionable than a complex automated system that nobody trusts.

What is Database Capacity Planning?

Database capacity planning is the practice of measuring current resource consumption, modeling how that consumption will grow over time, and provisioning additional resources before existing capacity becomes a constraint on availability or performance. The goal is not to predict the future with perfect accuracy — it is to maintain sufficient headroom so that growth never becomes an emergency.

Capacity planning applies across four primary resource dimensions: compute (CPU and memory), I/O throughput (IOPS and latency), network bandwidth, and storage. Each dimension has different growth characteristics, different failure modes when exhausted, and different lead times for remediation. Storage exhaustion kills writes immediately. CPU saturation degrades latency gradually. Running out of available connections causes new client requests to fail while existing ones continue. Understanding these distinctions shapes how aggressively you plan for each.

For senior DBAs and SREs, capacity planning sits at the intersection of infrastructure engineering and business forecasting. A database serving a SaaS product with predictable monthly subscriber growth requires a different model than one backing a marketplace with lumpy, event-driven transaction spikes. The planning process must account for the nature of your workload, not just raw historical averages.

How Capacity Planning Works

The capacity planning process follows a repeatable cycle: collect baseline metrics, establish growth trends, model future resource requirements, define thresholds, and take action before those thresholds are breached. The cycle runs continuously — not as a quarterly exercise, but as an ongoing operational discipline.

The Four Foundational Metrics

Before you can model growth, you need a reliable baseline. Four metrics form the foundation of any database capacity plan:

  • CPU utilization — Average and peak CPU across your database host. Average matters for trend analysis; peak matters for headroom. A database running at 40% average but hitting 95% during nightly batch jobs is closer to its limit than the average suggests.
  • IOPS (read and write separately) — I/O operations per second, broken out by reads and writes. Write IOPS growth often tracks data ingestion rates directly. Read IOPS growth may indicate missing indexes, cache hit rate degradation, or genuine query volume growth.
  • Active connections — The number of connections actively using the database at any given moment, as well as peak connection count. Many databases have hard connection limits; hitting them causes immediate client-facing failures.
  • Storage growth rate — The rate at which your data volume is increasing, measured in GB per day or GB per week. This is your most predictable metric and your most catastrophic failure mode if ignored.

Linear vs. Exponential Growth Modeling

Most teams default to linear growth modeling: if you grew by 10 GB last month, you will grow by 10 GB next month. This is appropriate for mature, stable products with consistent user behavior. For products in growth phases, linear models systematically underestimate future consumption.

Exponential growth modeling applies a compound rate to current consumption. If your data volume grew 8% month-over-month for the past six months, projecting that rate forward gives you a dramatically different capacity curve than linear extrapolation:

  • Linear model: Current size + (monthly growth rate × months ahead)
  • Exponential model: Current size × (1 + monthly growth rate)months ahead

In practice, build both projections and use the exponential model as your planning scenario. When reality falls between the two curves, you have headroom. When reality tracks the exponential curve, you are still prepared.

Extracting Metrics from PostgreSQL and MySQL

PostgreSQL: pg_stat_bgwriter and pg_stat_database

PostgreSQL's system catalog views expose a wealth of capacity-relevant data without requiring external agents. The two most useful for capacity planning are pg_stat_bgwriter and pg_stat_database.

pg_stat_bgwriter tracks buffer writes, checkpoint behavior, and I/O pressure from the background writer process. High buffers_clean relative to buffers_checkpoint indicates the background writer is working hard to keep up — an early signal of I/O capacity constraints.

sql
-- Checkpoint and buffer write activity (run periodically and record)
SELECT
  checkpoints_timed,
  checkpoints_req,
  buffers_checkpoint,
  buffers_clean,
  buffers_backend,
  buffers_alloc,
  ROUND(
    100.0 * buffers_clean / NULLIF(buffers_checkpoint + buffers_clean + buffers_backend, 0),
    2
  ) AS bgwriter_write_pct,
  stats_reset
FROM pg_stat_bgwriter;

pg_stat_database gives you per-database transaction throughput, tuple access rates, and temporary file usage — all critical for baseline modeling.

sql
-- Database-level throughput and temp file usage
SELECT
  datname,
  xact_commit + xact_rollback          AS total_transactions,
  tup_inserted,
  tup_updated,
  tup_deleted,
  tup_fetched,
  temp_files,
  temp_bytes,
  blks_read,
  blks_hit,
  ROUND(
    100.0 * blks_hit / NULLIF(blks_read + blks_hit, 0),
    2
  ) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1')
ORDER BY total_transactions DESC;

For storage growth, query the actual relation sizes across your largest tables on a scheduled basis and store the results in a metrics table:

sql
-- Top 20 tables by total size (data + indexes)
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
  pg_total_relation_size(schemaname || '.' || tablename)                 AS total_bytes
FROM pg_tables
ORDER BY total_bytes DESC
LIMIT 20;

MySQL: SHOW STATUS and Information Schema

MySQL exposes capacity-relevant data through SHOW GLOBAL STATUS and the information_schema. The most useful variables for capacity planning are:

sql
-- Key MySQL status variables for capacity tracking
SHOW GLOBAL STATUS WHERE Variable_name IN (
  'Threads_connected',
  'Threads_running',
  'Max_used_connections',
  'Innodb_buffer_pool_reads',
  'Innodb_buffer_pool_read_requests',
  'Innodb_os_log_written',
  'Innodb_data_reads',
  'Innodb_data_writes',
  'Created_tmp_disk_tables',
  'Handler_read_rnd_next'
);

-- Storage consumption per database
SELECT
  table_schema                                    AS database_name,
  ROUND(SUM(data_length + index_length) / 1024 / 1024 / 1024, 2) AS size_gb,
  ROUND(SUM(data_free) / 1024 / 1024 / 1024, 2)  AS free_gb
FROM information_schema.tables
GROUP BY table_schema
ORDER BY size_gb DESC;
Tip

Schedule these queries to run hourly and write the results to a dedicated capacity_metrics table (or push to your TSDB of choice). Raw snapshots are useless — you need the time series to model growth. Even a simple PostgreSQL table with a timestamptz column and a JSONB column for results will get you started in under an hour.

RDS and CloudWatch Metrics

If you are running on Amazon RDS, CloudWatch provides the majority of the metrics you need without any custom instrumentation. The key metrics to track and alert on:

  • FreeStorageSpace — Remaining disk in bytes. This is the most critical RDS metric. Convert to a percentage of allocated storage and alert at 30% remaining (70% utilization).
  • DatabaseConnections — Active connection count. Compare against max_connections for your instance class.
  • ReadIOPS / WriteIOPS — I/O operations per second. Compare against your provisioned IOPS limit (for io1/io2 volumes) or the burst bucket ceiling (for gp2/gp3).
  • CPUUtilization — Percentage CPU. Track the 95th percentile, not just the average.
  • ReadLatency / WriteLatency — Latency in seconds. Rising latency under constant IOPS indicates approaching I/O saturation.
  • FreeableMemory — Available RAM. When this trends toward zero, your buffer pool is being evicted, which will spike read IOPS.
bash
# Example: Query FreeStorageSpace for the last 7 days via AWS CLI
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name FreeStorageSpace \
  --dimensions Name=DBInstanceIdentifier,Value=your-db-instance \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Average \
  --query 'Datapoints[*].[Timestamp,Average]' \
  --output text | sort
Warning

RDS gp2 storage volumes use a burst credit model for IOPS. Volumes under 1 TB can burst to 3,000 IOPS but only sustain 3 IOPS/GB. If your instance is 200 GB, your sustained IOPS ceiling is 600 — far below what most production workloads need. Depleted burst credits cause dramatic, sudden IOPS collapses. Always monitor BurstBalance for gp2 volumes and migrate to gp3 or io1 before you hit sustained workloads.

Best Practices for Database Capacity Planning

Use the 70% Utilization Ceiling

The standard recommendation across infrastructure engineering is to treat 70% utilization as the operational ceiling — not the point at which you start planning, but the point at which additional resources should already be provisioned and ready. This gives you 30% headroom to absorb unexpected spikes, accommodate the lag between identifying a need and completing a provisioning process, and avoid the performance cliff effects that appear as utilization approaches 100%.

Define Alerting Thresholds That Provide Lead Time

Alerting at 90% storage utilization gives you hours of lead time. Alerting at 70% gives you days to weeks. Structure your thresholds accordingly:

  • 60% utilization — Warning: Acknowledge and review growth projections. Determine if provisioning is needed within the next 30 days.
  • 75% utilization — High: Begin provisioning process. For cloud databases, initiate storage scaling. For on-premises, raise a hardware request.
  • 85% utilization — Critical: Escalate. Something in the provisioning process has stalled. Treat as an active risk.
Important

Connection limits behave differently than storage — there is no warning gradient. A PostgreSQL instance at 98% of max_connections may be functioning normally until the next application deployment scales out and immediately saturates the remaining two slots. Always reserve a dedicated superuser connection slot (PostgreSQL's superuser_reserved_connections parameter handles this) and alert at 80% of non-reserved connections.

The Capacity Planning Spreadsheet

For most teams, a well-maintained spreadsheet updated weekly outperforms a complex automated system that the team does not trust or understand. A minimal but effective capacity planning sheet includes:

  • One row per week per metric (storage GB, peak CPU %, peak IOPS, peak connections)
  • A column for 30-day linear projected value
  • A column for 90-day exponential projected value
  • A column for the capacity ceiling (provisioned maximum)
  • A column for headroom percentage (ceiling minus projection, divided by ceiling)
  • A conditional format that highlights rows where headroom drops below 30%

This simple structure, maintained consistently, surfaces capacity risks weeks before they become incidents. It also creates an institutional record of growth history that informs future infrastructure decisions.

Normalize Growth by Business Metric

Raw metric growth is harder to validate than growth normalized against a business driver. If your data volume grows by 12% while your user base grows by 12%, the growth is proportional and expected. If data volume grows 35% while users grow 8%, something changed — a new feature, a logging change, or a runaway process. Tracking both the infrastructure metric and its business-metric ratio makes anomalies visible before they become capacity crises.

Tip

Establish a per-user storage budget at your current baseline. If you currently store 4 MB per active user and you add 50,000 users per quarter, you have a tractable growth model. When the per-user storage footprint starts rising, you catch it immediately rather than treating storage growth as an inexplicable black box.

Key Takeaways

Key Takeaways
  • The four foundational capacity metrics are CPU utilization, IOPS (read and write), active connection count, and storage growth rate — collect all four before building any growth model.
  • PostgreSQL's pg_stat_bgwriter and pg_stat_database give you I/O pressure, cache hit ratios, and transaction throughput natively; MySQL's SHOW GLOBAL STATUS provides equivalent coverage.
  • Build both linear and exponential growth projections; use the exponential model as your planning scenario and treat the difference as your uncertainty buffer.
  • Target 70% utilization as your operational ceiling — resources should be provisioned before you reach that level, not after.
  • Set warning alerts at 60%, high alerts at 75%, and critical alerts at 85% to ensure actionable lead time at each threshold.
  • For RDS, monitor FreeStorageSpace, DatabaseConnections, ReadIOPS/WriteIOPS, and BurstBalance (for gp2 volumes) as your baseline CloudWatch metric set.
  • Normalize raw growth metrics against a business driver (users, transactions, events) to catch anomalies early and build credible forecasts for infrastructure investment.
  • A weekly-updated spreadsheet beats an automated system your team does not trust — start simple, then automate what is proven valuable.

Scale Your Database Capacity With Confidence on JusDB

Capacity planning is only half the battle — acting on those projections quickly and without operational risk is the other half. JusDB gives senior DBAs and SREs the managed infrastructure layer to provision, scale, and monitor PostgreSQL and MySQL instances without the friction of DIY infrastructure management.

  • Automated storage scaling — JusDB monitors your storage growth rate and scales volumes automatically before you breach thresholds, with no downtime.
  • Built-in capacity dashboards — CPU, IOPS, connection counts, and storage utilization are tracked and visualized out of the box, with configurable alerting thresholds.
  • Growth trend reports — Weekly capacity reports surface your growth trajectory against provisioned headroom, so your team has the data to plan infrastructure investments before they become urgent.
  • Right-sized instance recommendations — JusDB analyzes your historical workload and recommends instance class changes when your current configuration is over- or under-provisioned.

Stop discovering capacity problems in the middle of the night. Start a free JusDB trial and bring your capacity planning process into a managed, data-driven environment built for production databases.

Share this article

Need Expert Help?

Need expert help? Our specialists can assist with these related services: