AWS RDS PostgreSQL Cost Optimization by JusDB
AWS RDS PostgreSQL Cost Optimization
By JusDB — practical, battle-tested ways to cut Amazon RDS for PostgreSQL spend without hurting reliability or performance.
TL;DR checklist
- Right-size on Graviton (
db.t4g / db.m7g / db.r7g
) before anything else. - Pick the cheapest HA that fits RTO/RPO: Single-AZ → Multi-AZ DB Instance (2 nodes) → Multi-AZ DB Cluster (3 nodes).
- Use gp3 storage; provision only the IOPS/throughput you need; enable Storage autoscaling.
- Tune autovacuum early to avoid bloat (wasted storage + I/O).
- Stop non-prod nightly; schedule automatic stop/start.
- Buy Reserved Instances for steady prod (1–3 years).
- Trim snapshots & keep log retention sane (CloudWatch costs add up).
- Use pgBouncer or RDS Proxy to prevent connection storms from forcing oversized instances.
- Monitor replication slots & long transactions—both can inflate WAL/storage costs.
1) Right-size the instance (start with Graviton)
- Prefer Graviton families (
db.t4g
,db.m7g
,db.r7g
) for better price/perf; uset
for light, spiky dev/test only. - Match memory to workload: PostgreSQL benefits from RAM (shared buffers + OS cache). If queries spill, try more RAM before more vCPUs.
- Re-evaluate size after fixing slow queries and bloat; many clusters run 1–2 sizes too big due to avoidable I/O.
2) Choose the right HA option (don’t overpay)
- Single-AZ: cheapest, no automatic failover—fine for dev/test and low-impact internal tools.
- Multi-AZ DB Instance: primary + standby in another AZ (roughly 2× instance hours); automatic failover.
- Multi-AZ DB Cluster: 1 writer + 2 readers (3 instances) with faster failover; higher cost, higher availability/perf.
Tip: Reserve Multi-AZ/Cluster for prod OLTP. Many staging and internal apps don’t need it.
3) Storage: use gp3 and size IOPS deliberately
- gp3 fits most workloads; decouples capacity and performance.
- Set explicit IOPS/throughput based on metrics; avoid overprovisioning “just in case”.
- io1/io2 only when you have sustained high I/O that gp3 can’t satisfy.
- Enable storage autoscaling to expand capacity without emergency downtime.
4) Backups, snapshots, and WAL
- Automated backup storage is typically free up to DB size in-region; manual snapshots & longer retention are billed.
- Prune old manual snapshots; copy cross-region only for DR requirements.
- Watch WAL accumulation: stalled logical slots or long transactions keep WAL around → higher storage & I/O.
5) Autovacuum & bloat control (direct cost impact)
- Tune
autovacuum_vacuum_scale_factor
/autovacuum_analyze_scale_factor
per table (lower for hot tables) to prevent bloat that wastes storage and IOPS. - Avoid long transactions—they block vacuum progress and inflate table/WAL size.
- Use pg_repack (extension supported on RDS) to reclaim space without long locks when tables already bloated.
- Partition very large, write-heavy tables to keep vacuum/maintenance localized and cheaper.
6) Stop idle environments
- For dev/stage/sandbox: stop RDS nightly/weekends. You still pay for storage/backups, but not instance hours.
- Automate stop/start via EventBridge + Lambda or SSM Maintenance Windows.
7) Commit for steady prod: Reserved Instances
- Buy RDS Reserved Instances (1 or 3 years) for 24×7 prod to capture meaningful discounts over On-Demand.
- Mix Convertible and Standard to balance flexibility and savings.
8) Observability costs (keep them lean)
- Performance Insights: 7-day retention is free; longer retention costs per vCPU—set the minimum that suits your investigations.
- CloudWatch Logs: keep
log_min_duration_statement
high enough for signal; set sane retention to cap costs.
9) Connection management: pgBouncer vs RDS Proxy
- pgBouncer (session pooling) is lightweight and inexpensive; deploy on ECS/EC2 for app tiers that open many short-lived connections.
- RDS Proxy adds managed pooling/failover help and can be worth it if it lets you run a smaller DB or stabilizes serverless/Lambda spikes.
10) Data transfer & topology
- Co-locate app and DB in the same AZ to avoid cross-AZ data transfer charges.
- Use read replicas only when they offset primary load; turn off unused ones.
PostgreSQL-specific quick wins
Action | Effort | Typical impact | When to do it |
---|---|---|---|
Switch to Graviton | Low–Med | Lower $/perf | Maintenance window or blue/green |
Move gp2/PIOPS → gp3 | Low | Lower storage $ | Most OLTP workloads |
Enable storage autoscaling | Low | Avoid emergency resizes | Immediately |
Tune autovacuum on hot tables | Low | Cut IOPS & storage bloat | Week 1 |
Reclaim bloat with pg_repack | Med | Lower storage $ | After metrics show bloat |
Partition largest write-heavy tables | Med | Cheaper maintenance | When tables exceed tens of GB |
Stop non-prod nightly/weekends | Low | ~50–70% instance $ off-hours | Dev/test/sandbox |
Buy RIs for prod | Low | Deep discount | Stable 24×7 workloads |
Cap PI/CloudWatch retention | Low | Lower observability $ | Immediately |
Add pgBouncer / RDS Proxy | Med | Smaller DB possible | Connection storms, serverless |
Monitor logical slots & WAL | Low | Prevent runaway storage | Always-on check |
JusDB’s recommended rollout (safe & simple)
- Measure a week: CPU, RAM, read/write IOPS, throughput, connections, slow queries, bloat (pg_stat), WAL growth, autovacuum activity.
- Right-size on Graviton; switch storage to gp3 with explicit IOPS/throughput; enable storage autoscaling.
- Tune autovacuum for hot tables; fix long transactions; monitor logical replication slots; repack if already bloated.
- Trim snapshots & set CloudWatch/PI retention; schedule stop/start for all non-prod.
- Buy RIs for steady prod once stable; consider pgBouncer or RDS Proxy if connection churn is the bottleneck.
Need hands-on help? JusDB can audit your fleet, implement the above safely, and track savings.