AWS RDS PostgreSQL Cost Optimization by JusDB

November 1, 2025

5 min read

0 views

AWS RDS PostgreSQL Cost Optimization

By JusDB — practical, battle-tested ways to cut Amazon RDS for PostgreSQL spend without hurting reliability or performance.

TL;DR checklist

Right-size on Graviton (db.t4g / db.m7g / db.r7g) before anything else.
Pick the cheapest HA that fits RTO/RPO: Single-AZ → Multi-AZ DB Instance (2 nodes) → Multi-AZ DB Cluster (3 nodes).
Use gp3 storage; provision only the IOPS/throughput you need; enable Storage autoscaling.
Tune autovacuum early to avoid bloat (wasted storage + I/O).
Stop non-prod nightly; schedule automatic stop/start.
Buy Reserved Instances for steady prod (1–3 years).
Trim snapshots & keep log retention sane (CloudWatch costs add up).
Use pgBouncer or RDS Proxy to prevent connection storms from forcing oversized instances.
Monitor replication slots & long transactions—both can inflate WAL/storage costs.

Prefer Graviton families (db.t4g, db.m7g, db.r7g) for better price/perf; use t for light, spiky dev/test only.
Match memory to workload: PostgreSQL benefits from RAM (shared buffers + OS cache). If queries spill, try more RAM before more vCPUs.
Re-evaluate size after fixing slow queries and bloat; many clusters run 1–2 sizes too big due to avoidable I/O.

Single-AZ: cheapest, no automatic failover—fine for dev/test and low-impact internal tools.
Multi-AZ DB Instance: primary + standby in another AZ (roughly 2× instance hours); automatic failover.
Multi-AZ DB Cluster: 1 writer + 2 readers (3 instances) with faster failover; higher cost, higher availability/perf.

Tip: Reserve Multi-AZ/Cluster for prod OLTP. Many staging and internal apps don’t need it.

gp3 fits most workloads; decouples capacity and performance.
Set explicit IOPS/throughput based on metrics; avoid overprovisioning “just in case”.
io1/io2 only when you have sustained high I/O that gp3 can’t satisfy.
Enable storage autoscaling to expand capacity without emergency downtime.

Automated backup storage is typically free up to DB size in-region; manual snapshots & longer retention are billed.
Prune old manual snapshots; copy cross-region only for DR requirements.
Watch WAL accumulation: stalled logical slots or long transactions keep WAL around → higher storage & I/O.

Tune autovacuum_vacuum_scale_factor / autovacuum_analyze_scale_factor per table (lower for hot tables) to prevent bloat that wastes storage and IOPS.
Avoid long transactions—they block vacuum progress and inflate table/WAL size.
Use pg_repack (extension supported on RDS) to reclaim space without long locks when tables already bloated.
Partition very large, write-heavy tables to keep vacuum/maintenance localized and cheaper.

For dev/stage/sandbox: stop RDS nightly/weekends. You still pay for storage/backups, but not instance hours.
Automate stop/start via EventBridge + Lambda or SSM Maintenance Windows.

Buy RDS Reserved Instances (1 or 3 years) for 24×7 prod to capture meaningful discounts over On-Demand.
Mix Convertible and Standard to balance flexibility and savings.

Performance Insights: 7-day retention is free; longer retention costs per vCPU—set the minimum that suits your investigations.
CloudWatch Logs: keep log_min_duration_statement high enough for signal; set sane retention to cap costs.

pgBouncer (session pooling) is lightweight and inexpensive; deploy on ECS/EC2 for app tiers that open many short-lived connections.
RDS Proxy adds managed pooling/failover help and can be worth it if it lets you run a smaller DB or stabilizes serverless/Lambda spikes.

Co-locate app and DB in the same AZ to avoid cross-AZ data transfer charges.
Use read replicas only when they offset primary load; turn off unused ones.

Action	Effort	Typical impact	When to do it
Switch to Graviton	Low–Med	Lower $/perf	Maintenance window or blue/green
Move gp2/PIOPS → gp3	Low	Lower storage $	Most OLTP workloads
Enable storage autoscaling	Low	Avoid emergency resizes	Immediately
Tune autovacuum on hot tables	Low	Cut IOPS & storage bloat	Week 1
Reclaim bloat with pg_repack	Med	Lower storage $	After metrics show bloat
Partition largest write-heavy tables	Med	Cheaper maintenance	When tables exceed tens of GB
Stop non-prod nightly/weekends	Low	~50–70% instance $ off-hours	Dev/test/sandbox
Buy RIs for prod	Low	Deep discount	Stable 24×7 workloads
Cap PI/CloudWatch retention	Low	Lower observability $	Immediately
Add pgBouncer / RDS Proxy	Med	Smaller DB possible	Connection storms, serverless
Monitor logical slots & WAL	Low	Prevent runaway storage	Always-on check

Measure a week: CPU, RAM, read/write IOPS, throughput, connections, slow queries, bloat (pg_stat), WAL growth, autovacuum activity.
Right-size on Graviton; switch storage to gp3 with explicit IOPS/throughput; enable storage autoscaling.
Tune autovacuum for hot tables; fix long transactions; monitor logical replication slots; repack if already bloated.
Trim snapshots & set CloudWatch/PI retention; schedule stop/start for all non-prod.
Buy RIs for steady prod once stable; consider pgBouncer or RDS Proxy if connection churn is the bottleneck.

Need hands-on help? JusDB can audit your fleet, implement the above safely, and track savings.