Memory management is one of the least visible yet highest-impact levers in PostgreSQL performance tuning. Most DBAs focus on shared_buffers, work_mem, and query plans — but overlook the hardware-level memory mapping that underpins all of it. Configuring Linux huge pages for PostgreSQL is a kernel-level change that can deliver 5–20% throughput gains on OLTP workloads and up to 30% on OLAP queries — without touching a single SQL statement. If you are running PostgreSQL on a server with tens of gigabytes of RAM and have not configured huge pages, you are leaving measurable performance on the table.
- Standard Linux uses 4KB memory pages; huge pages use 2MB — requiring far fewer TLB entries for the same memory footprint.
- PostgreSQL's
shared_buffersis a single large shared memory segment accessed by all backend processes — an ideal fit for huge pages. - Enable static huge pages via
vm.nr_hugepagesin/etc/sysctl.confand sethuge_pages = oninpostgresql.conf. - Disable Transparent Huge Pages (THP) — it is not equivalent to static huge pages and causes latency spikes in PostgreSQL.
- Use
huge_pages = tryin staging first; switch toononly after confirming successful allocation.
What are Huge Pages?
Every modern operating system manages physical RAM through a system of pages — fixed-size chunks of memory mapped between virtual addresses (what your process sees) and physical addresses (where the data actually lives in RAM). On Linux, the default page size is 4 kilobytes. This default has served well for decades, but as server RAM has grown into the tens and hundreds of gigabytes, 4KB pages create a structural problem at scale.
The CPU maintains a hardware cache called the Translation Lookaside Buffer (TLB) — a small, extremely fast cache that stores recent virtual-to-physical address mappings. When a process accesses memory, the CPU first checks the TLB. If the mapping is there (a TLB hit), access is nearly instantaneous. If not (a TLB miss), the CPU must perform a full page table walk through multiple levels of kernel data structures — a process that can take dozens to hundreds of nanoseconds per miss.
Here is the core problem at scale:
- With 4KB pages, a 64GB
shared_buffersrequires approximately 16 million TLB entries — far beyond any CPU's TLB capacity (typically 1,000–4,000 entries on modern hardware). - With 2MB huge pages, the same 64GB requires only 32,768 entries — small enough to fit comfortably in the TLB.
Linux supports two practical huge page sizes: 2MB (standard huge pages, available on virtually all x86_64 systems) and 1GB (gigantic pages, requiring explicit CPU and kernel support). For most PostgreSQL deployments, 2MB huge pages are the right choice. The result of proper huge page configuration: fewer TLB misses, faster memory access, and meaningfully reduced CPU overhead for all memory-intensive database operations.
Why PostgreSQL Benefits from Huge Pages
TLB Pressure from the Shared Buffer Pool
PostgreSQL's memory architecture is built around a single, large shared memory segment called shared_buffers. Every PostgreSQL process — backend connections, autovacuum workers, the background writer, the WAL writer, the checkpointer — maps and accesses this same segment continuously. On a busy server, this segment is accessed millions of times per second across dozens of concurrent processes.
Without huge pages, every access that is not in the TLB triggers a page table walk. At the scale of a multi-gigabyte buffer pool, TLB misses are essentially constant. The CPU spends a disproportionate fraction of its time doing address translation instead of executing actual database work. This overhead shows up as elevated CPU utilization, increased query latency, and reduced throughput — all without any obvious query-level cause.
With 2MB huge pages, the same buffer pool is covered by orders of magnitude fewer TLB entries. TLB hit rates approach 100%, and memory access latency drops to consistent sub-microsecond levels. This matters most on workloads that access large portions of shared_buffers randomly — high-concurrency OLTP, analytical queries, and index scans across wide tables all benefit significantly.
shared_buffers Interaction
The larger your shared_buffers, the greater the benefit from huge pages. For a typical recommendation of 25% of system RAM, the numbers look like this:
- 32GB shared_buffers → ~8 million 4KB page TLB entries vs. 16,384 huge page entries
- 64GB shared_buffers → ~16 million 4KB page TLB entries vs. 32,768 huge page entries
- 128GB shared_buffers → ~32 million 4KB page TLB entries vs. 65,536 huge page entries
This is why huge pages and shared_buffers tuning are inseparable. Setting shared_buffers = 128GB without huge pages creates enormous TLB pressure that caps your achievable throughput. The two settings must be configured together for the full performance benefit to materialize. Beyond the TLB, huge pages also reduce the overhead of page table management, kernel memory allocator calls, and the cost of mapping the shared segment into each new backend process at connection time.
Enabling Huge Pages on Linux
Step-by-Step Kernel Configuration
Step 1: Calculate how many huge pages you need
The most accurate method is to inspect the running PostgreSQL process's peak virtual memory requirement directly from the kernel:
# Check current PostgreSQL shared memory requirement
postgres_pid=$(head -1 /var/lib/postgresql/17/main/postmaster.pid)
cat /proc/$postgres_pid/status | grep VmPeak
# Or calculate from shared_buffers directly:
# shared_buffers = 32GB → need 32*1024/2 = 16384 huge pages (with 2MB pages)
# Add 10% buffer for safety: 18000
# Check current huge page configuration
cat /proc/meminfo | grep -i huge
# HugePages_Total: 0 ← need to set this
# HugePages_Free: 0
# Hugepagesize: 2048 kB ← 2MB pages availableStep 2: Configure huge pages in the kernel
# Set huge pages count (persists across reboots)
echo 'vm.nr_hugepages = 18000' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# Verify allocation
cat /proc/meminfo | grep HugePages_Total
# HugePages_Total: 18000
# If HugePages_Free drops to 0 and Total did not increase fully,
# you may need to set NUMA-aware huge pages per node:
echo 18000 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepagesnumactl --hardware to inspect your topology before configuring.Step 3: Set kernel shared memory limits
# Calculate: shared_buffers in bytes + 10%
# For 32GB shared_buffers: 32*1024*1024*1024 = 34359738368 bytes
echo 'kernel.shmmax = 34359738368' | sudo tee -a /etc/sysctl.conf
echo 'kernel.shmall = 8388608' | sudo tee -a /etc/sysctl.conf
sudo sysctl -pConfiguring PostgreSQL for Huge Pages
postgresql.conf Settings
# postgresql.conf
shared_buffers = 32GB # Your target value
huge_pages = on # 'on', 'off', or 'try'
# 'try' will not fail if huge pages are unavailablehuge_pages = try During Initial ConfigurationSet
huge_pages = try (not on) during initial configuration. If huge pages are unavailable — because the count was miscalculated or allocation failed — huge_pages = on will prevent PostgreSQL from starting entirely. Switch to on only after confirming it works in staging.Verifying Huge Pages are Active
-- Verify huge pages are active after PostgreSQL restart
SHOW huge_pages;
-- huge_pages
-- ----------
-- on
-- Also verify the runtime setting PostgreSQL actually started with
SELECT name, setting, unit FROM pg_settings WHERE name = 'huge_pages';# Check from OS side — Free count should drop after PostgreSQL starts
cat /proc/meminfo | grep HugePages
# HugePages_Total: 18000
# HugePages_Free: 1616 ← significantly less = PostgreSQL is using them
# HugePages_Rsvd: 16384
# HugePages_Surp: 0HugePages_Rsvd counter shows pages reserved but not yet faulted in. After PostgreSQL starts and begins loading data into shared_buffers under real workload, this count will decrease and HugePages_Free will drop correspondingly. A large HugePages_Rsvd immediately after restart is normal and expected.Transparent Huge Pages (THP) — The Common Mistake
Linux also provides a feature called Transparent Huge Pages (THP), which attempts to automatically promote groups of 4KB pages into 2MB huge pages at runtime — without manual configuration. On the surface, this sounds like a convenient alternative. In practice, it is one of the most common sources of intermittent performance problems in production PostgreSQL deployments, and the two mechanisms should not be confused.
Here is why THP hurts PostgreSQL specifically: the kernel daemon khugepaged scans memory at runtime, identifies candidate page groups, and collapses them into huge pages dynamically. This collapse operation requires briefly stopping normal memory access to the affected region — a stop-the-world pause at the kernel level. These pauses are unpredictable, can last multiple seconds under memory pressure or during memory defragmentation, and manifest as sudden latency spikes in query execution times. Because they are triggered by kernel scheduling decisions rather than database activity, they are notoriously difficult to diagnose from PostgreSQL-side monitoring alone.
Static huge pages (allocated before PostgreSQL starts) carry none of this overhead. They are pre-allocated at boot time, PostgreSQL requests them at startup via a single mmap call, and they remain stable and contiguous for the lifetime of the process. No background daemon, no runtime collapsing, no stop-the-world events.
Never enable Transparent Huge Pages (THP) thinking it is equivalent to static huge pages — it causes intermittent multi-second pause events in PostgreSQL as the kernel defragments memory. Always disable THP explicitly:
echo never > /sys/kernel/mm/transparent_hugepage/enabled.# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never ← 'always' is the dangerous default on many distros
# DISABLE THP for PostgreSQL (causes latency spikes and memory fragmentation)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
# Make it persistent across reboots
cat >> /etc/rc.local << 'EOF'
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
EOFOn systemd-based distributions, the /etc/rc.local approach may not execute reliably. A more robust method is to create a dedicated systemd service unit that runs at boot before PostgreSQL starts, or to use the GRUB kernel parameter transparent_hugepage=never to disable THP at the kernel level entirely.
Benchmarking and Validating the Impact
Never deploy a kernel-level configuration change to production without measuring its actual impact in your environment first. Use pgbench to establish a baseline before enabling huge pages and compare results after a clean PostgreSQL restart.
# Initialize pgbench schema (scale factor 100 = ~1.4GB of data)
pgbench -i -s 100 myapp
# Baseline benchmark BEFORE enabling huge pages
# 50 clients, 4 threads, 60-second duration
pgbench -c 50 -j 4 -T 60 myapp
# Enable huge pages (steps above), then restart PostgreSQL
sudo systemctl restart postgresql
# Benchmark AFTER enabling huge pages
pgbench -c 50 -j 4 -T 60 myapp
# For latency percentile breakdown (requires pgbench 14+)
pgbench -c 50 -j 4 -T 60 --latency-limit=100 --progress=5 myapp
# Measure TLB misses directly at hardware level
perf stat -e dTLB-load-misses,dTLB-loads -p $(pgrep -f "postgres: bgwriter") sleep 30The following table shows representative results from a PostgreSQL 16 instance with 32GB shared_buffers on a bare-metal server with 128GB RAM, running pgbench at scale factor 500:
| Metric | Without Huge Pages | With Huge Pages (2MB) | Improvement |
|---|---|---|---|
| TLB misses per second | ~4.2 million | ~18,000 | 99.6% reduction |
| shared_buffer hit latency (avg) | ~380 ns | ~95 ns | 75% reduction |
| pgbench TPS (read-write, 50 clients) | 14,220 | 17,850 | +25.5% |
| pgbench TPS (read-only, 50 clients) | 31,400 | 36,800 | +17.2% |
| p99 query latency | 12.4 ms | 8.1 ms | 34.7% reduction |
Results vary by workload, hardware architecture, and the ratio of working dataset to buffer pool size. OLAP workloads with large sequential scans tend to see higher gains than simple OLTP patterns with hot data. Always measure in your own environment before drawing conclusions.
Cloud and managed PostgreSQL notes:
- AWS EC2: Huge pages are supported on most instance types. Nitro-based instances (c5, r5, m5 and later generations) provide the best results. Configure via the standard Linux sysctl approach above.
- Amazon RDS PostgreSQL: Huge pages are managed automatically. RDS sets
huge_pages = tryby default on larger instance classes (db.r5, db.r6g, and above). You do not have kernel-level access to configure page allocation directly. - Google Cloud SQL: Huge pages are not user-configurable. Google manages kernel settings transparently on your behalf.
- Azure Database for PostgreSQL Flexible Server: Huge pages are enabled automatically on memory-optimized compute tiers.
Key Takeaways
- Root cause matters: TLB misses from 4KB pages are a real, measurable bottleneck on large-RAM PostgreSQL servers — not a theoretical concern. They manifest as elevated CPU utilization and higher query latency under concurrent load.
- Calculate before you configure: Use
VmPeakfrom the PostgreSQL postmaster's/procentry or calculate directly from your targetshared_bufferssize. Always add a 10% buffer to the huge page count to avoid allocation shortfalls. - Static huge pages only: Configure
vm.nr_hugepagesin/etc/sysctl.confand explicitly disable THP. Static huge pages and THP are opposites in behavior — enabling THP while using static huge pages provides no additional benefit and adds latency risk. - Start with
huge_pages = try: This setting allows PostgreSQL to start even if huge page allocation failed. Switch toononly after verifying correct allocation in staging with/proc/meminfoandSHOW huge_pages. - NUMA awareness required on multi-socket servers: Allocate huge pages per NUMA node to avoid cross-node memory penalties. Use
numactlto inspect topology before configuring. - Benchmark everything: Use pgbench with a realistic scale factor and client count matching your production concurrency. Measure TLB misses with
perf stat -e dTLB-load-missesbefore and after to quantify the kernel-level improvement independently.
Working with JusDB on PostgreSQL Memory Tuning
Huge pages are one component of a broader PostgreSQL memory tuning strategy. Getting the configuration right — especially on large instances, NUMA systems, or cloud environments with opaque kernel configurations — requires understanding how shared_buffers, work_mem, huge_pages, and OS-level settings interact under your specific workload. A misconfigured kernel.shmmax, an incorrect huge page count, or THP enabled alongside static huge pages can all produce results worse than the default configuration.
At JusDB, we work with PostgreSQL deployments ranging from single-node production databases to distributed multi-region clusters. Our memory tuning engagements cover the full stack: kernel huge page configuration, PostgreSQL parameter tuning, wait event analysis, and workload-specific benchmarking to validate changes before they reach production.
If you are seeing unexplained latency spikes, high CPU utilization despite modest query load, or TLB-related metrics surfacing in your performance monitoring, get in touch — these are well-understood, solvable problems with the right diagnostic approach.
Related reading: