Database Performance

Database Connection Pool Sizing: The Formula, the Variables, and the Common Mistakes

How to size your database connection pool correctly — the HikariCP formula, the variables that determine optimal pool size, and the mistakes that cause connection exhaustion or underutilization.

JusDB Team
August 7, 2023
9 min read
187 views

A startup engineering team once set their PostgreSQL max_connections to 500 and their HikariCP pool to match — 500 connections per application pod, with eight pods running in production. Within hours of a traffic spike, the database server ran out of memory and crashed, taking the entire product offline for three hours. Each idle PostgreSQL connection consumes 5–10 MB of RAM just for bookkeeping, and 4,000 simultaneous connections will exhaust 20–40 GB before a single query runs. Getting connection pool sizing right is not a micro-optimization — it is one of the highest-leverage tuning decisions you will make for any application backed by MySQL or PostgreSQL.

TL;DR
  • PostgreSQL connections cost ~5–10 MB RAM each; MySQL threads cost ~1–2 MB — idle connections still consume those resources.
  • The practical PostgreSQL pool formula is (core_count * 2) + effective_spindle_count; for HikariCP use Tn * (Cm - 1) + 1.
  • Your application pool size and the database's max_connections are two separate knobs — conflating them is the most common mistake.
  • Use a connection pooler (PgBouncer, ProxySQL) in front of the database so the database sees a small, stable number of backend connections.
  • Monitor pg_stat_activity (PostgreSQL) or SHOW PROCESSLIST (MySQL) to see idle vs. active connections in real time.
  • A pool that is too large is usually more dangerous than one that is too small — size for throughput, not for peak concurrency.

Background: Why Connection Management Is a Bottleneck

Every database connection represents real operating system resources: a process or thread on the server side, file descriptors on both sides, memory buffers for sort operations and query results, and TLS session state if encryption is enabled. Establishing a new connection also requires a full TCP handshake, TLS negotiation, and authentication round-trips — work that can take 10–50 ms on a network hop. For an application handling 500 requests per second, paying that cost on every request would add 5–25 seconds of latency per second of throughput, which is simply not viable.

Connection pools solve this by keeping a set of pre-authenticated connections open and lending them to threads or coroutines for the duration of a query or transaction. The pool absorbs connection establishment latency and lets the database focus on query execution rather than session setup. The question is not whether to pool — the question is how many connections to keep in that pool.

The wrong mental model is: more connections = more throughput. Databases are I/O and CPU bound. Adding connections beyond what the hardware can service in parallel does not increase throughput — it increases queue depth, context switching overhead, and lock contention, while consuming memory that could otherwise go to the shared buffer cache.

The Hidden Cost of Connections

Before arriving at formulas, it is worth internalizing the per-connection cost at each database engine.

PostgreSQL

PostgreSQL uses a process-per-connection model. Each client connection forks a backend process that inherits the postmaster's memory layout. The baseline per-connection overhead is:

  • ~5–10 MB RAM per connection (shared memory excluded) — this covers the process stack, local sort buffers, and executor state.
  • Work memory (work_mem) is allocated per sort operation per connection, not per connection, but a busy connection can multiply this many times over.
  • File descriptor limits: each connection consumes at least two file descriptors on the server side.
Warning: work_mem compounds the RAM math. If work_mem is set to 64 MB and you have 200 active connections each running a sort, PostgreSQL may attempt to allocate 12.8 GB for sort buffers alone — on top of shared_buffers and connection overhead. Always factor work_mem into capacity planning alongside connection count.

MySQL (InnoDB)

MySQL uses a thread-per-connection model (or a thread pool in Percona Server / MySQL Enterprise). Thread overhead is lower than process overhead, but still meaningful:

  • ~1–2 MB RAM per thread (stack size plus per-thread buffers like sort_buffer_size, read_buffer_size, and join_buffer_size).
  • With default buffer settings (sort_buffer_size=256K, read_buffer_size=128K, join_buffer_size=256K), 1,000 connections add roughly 640 MB of per-thread buffer allocation on top of the InnoDB buffer pool.
  • max_connections in MySQL defaults to 151 — a conservative value that many teams raise without understanding the implications.
Key distinction: MySQL's max_connections is a hard ceiling enforced by the server. If your application pool attempts to open more connections than this limit, new connection attempts will receive "Too many connections" errors and fail — not queue. Always set max_connections higher than the total connections any single pool can open, with headroom for administrative access.

Pool Sizing Formula

The Percona / PostgreSQL Wiki Formula

The PostgreSQL wiki and Percona both recommend this heuristic as a starting point for the number of connections the database should serve simultaneously:

pool_size = (core_count * 2) + effective_spindle_count

Where:

  • core_count — physical CPU cores on the database server (not vCPUs if hyperthreading is enabled; use physical cores).
  • effective_spindle_count — the number of independent I/O units. For NVMe SSD, this is typically 1; for a RAID-10 array of spinning disks, count the number of physical drives serving reads.

Example: an 8-core database server on NVMe SSD → (8 * 2) + 1 = 17 backend connections. This feels shockingly small to most engineers who are used to seeing pools set to 100+. But Percona's benchmarks consistently show that this range maximizes throughput — additional connections beyond this point increase latency without increasing queries-per-second.

HikariCP Formula

HikariCP's documentation provides a formula focused on the application side — specifically for calculating the pool size needed to sustain a given request concurrency without thread starvation:

pool_size = Tn * (Cm - 1) + 1

Where:

  • Tn — the maximum number of threads in your application that will simultaneously need a database connection.
  • Cm — the ratio of total request time to database wait time (i.e., connection multiplier). If a request takes 100 ms total and spends 20 ms waiting for the database, Cm = 100/20 = 5.

Example: 50 request-handling threads, Cm = 5 → 50 * (5 - 1) + 1 = 201 connections. In practice this is a ceiling; you should tune downward by measuring idle connection time in production and shrinking the pool until p99 latency begins to rise.

Tip: reconcile both formulas. Use the Percona formula to determine the maximum connections the database can absorb, and the HikariCP formula to determine the minimum the application needs. If they conflict — which is common for multi-pod deployments — introduce PgBouncer or ProxySQL as a multiplexing layer between the application tier and the database. The pooler sees many application connections and maintains a small, stable set of backend connections.

Per-Language and Framework Settings

Once you have a target pool size, here is how to apply it across common stacks.

HikariCP (Java / Spring Boot)

sql
-- Verify active vs. idle connections in PostgreSQL:
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
# application.properties
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.max-lifetime=1800000

maximum-pool-size is the primary lever. Set minimum-idle equal to maximum-pool-size if you want a fixed-size pool (HikariCP recommends this for most production workloads to avoid the latency spike of growing the pool under load).

PgBouncer

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600

In transaction mode, PgBouncer releases the server connection back to the pool after each transaction completes. This allows 1,000 application connections to share 20 backend connections with high efficiency — as long as transactions are short and no session-level state (prepared statements, advisory locks, SET LOCAL) is required across transactions.

Django (Python)

# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'CONN_MAX_AGE': 60,          # seconds; 0 = new connection per request
        'OPTIONS': {
            'pool': True,            # Django 4.2+ persistent connections
        }
    }
}

Django's built-in connection persistence (CONN_MAX_AGE) is not a true pool — each worker process holds at most one connection. For real pooling in Django, deploy PgBouncer at the infrastructure layer or use django-db-geventpool / psycopg3's native async pool.

Node.js (pg / mysql2)

// PostgreSQL — node-postgres
const { Pool } = require('pg');
const pool = new Pool({
  max: 20,            // maximum pool size
  min: 5,             // minimum idle connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// MySQL — mysql2
const mysql = require('mysql2/promise');
const pool = mysql.createPool({
  host: 'localhost',
  database: 'mydb',
  connectionLimit: 20,   // equivalent to max pool size
  waitForConnections: true,
  queueLimit: 0,
});

Common Mistakes

Mistake 1: Setting Pool Size Equal to max_connections

This is the most destructive anti-pattern. If max_connections = 200 and every application pod sets its pool to 200, ten pods will attempt to open 2,000 connections — 10x the database limit. The result is either mass connection refusals or, if the DBA raised max_connections to accommodate, severe memory exhaustion.

Rule of thumb: The sum of maximum-pool-size * pod_count across all services sharing the database must be less than max_connections - 10 (preserving headroom for superuser administrative connections).

Mistake 2: No Pool — New Connection Per Request

Common in serverless functions, CGI scripts, and early-stage applications. At low traffic, the 10–50 ms connection overhead is invisible. At 100 RPS, you are spending 1–5 seconds of CPU time per second just on TCP and TLS handshakes, and the database's authentication subsystem becomes a bottleneck.

Mistake 3: Pool Size = CPU Cores * 10 (or Any Multiplier)

This rule of thumb has no theoretical basis. Application CPU cores and database backend connection capacity are unrelated quantities. The correct reference point is the database server's core count and I/O capacity — not the application server's resources.

Mistake 4: Ignoring Idle Connection Accumulation

Pools that open connections eagerly but close them lazily accumulate idle connections during traffic lulls. On PostgreSQL, each idle connection still holds its backend process in memory. Monitor with:

sql
-- PostgreSQL: see idle connections and how long they have been idle
SELECT
  pid,
  usename,
  application_name,
  state,
  now() - state_change AS idle_duration,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE state = 'idle'
ORDER BY idle_duration DESC;
sql
-- MySQL: check current connections and their state
SHOW PROCESSLIST;

-- Or for a count summary:
SELECT command, count(*) AS cnt
FROM information_schema.processlist
GROUP BY command
ORDER BY cnt DESC;

Set idle-timeout (HikariCP) or server_idle_timeout (PgBouncer) aggressively — 60–600 seconds — to reclaim idle connections between traffic peaks.

Key Takeaways

Key Takeaways
  • PostgreSQL connection overhead (~5–10 MB/connection) and MySQL thread overhead (~1–2 MB/connection) make oversized pools a direct cause of OOM crashes — right-size the pool, not the server.
  • Start with the Percona formula (core_count * 2) + effective_spindle_count for the number of backend connections the database should serve; use HikariCP's formula Tn * (Cm - 1) + 1 to size the application pool.
  • When deploying multiple application pods, the total connections across all pods must stay below max_connections with headroom — use PgBouncer or ProxySQL to multiplex if you cannot reduce per-pod pool sizes enough.
  • Monitor pg_stat_activity (PostgreSQL) and SHOW PROCESSLIST (MySQL) continuously; alert when idle connections exceed 30% of your pool for more than five minutes.
  • Set aggressive idle timeouts on both the pool and the database to reclaim connections between traffic peaks — a connection that is idle is not free.
  • Test pool sizing changes under synthetic load before deploying to production; measure p50, p95, and p99 query latency — not just throughput — as you scale pool size up and down.

Working with JusDB on Database Optimization

Getting connection pool sizing right requires understanding your specific workload profile — query duration distribution, concurrency patterns, memory headroom, and whether your deployment uses read replicas or connection poolers. At JusDB's MySQL and PostgreSQL managed services, we audit connection pool configurations as part of every engagement, correlating application-level pool metrics with server-side pg_stat_activity and InnoDB status to identify over-provisioned or under-provisioned pools before they cause incidents.

If you are experiencing connection exhaustion errors, OOM crashes after traffic spikes, or unexplained query latency increases during peak hours, reach out to the JusDB team for a database performance review. We work across HikariCP, PgBouncer, ProxySQL, and the full range of application frameworks to design a pooling architecture matched to your scale.

Share this article