Cloud Databases

AWS DMS: Database Migration Service Guide for MySQL and PostgreSQL

A production guide to AWS Database Migration Service — replication instance sizing, endpoint configuration, Full Load vs CDC vs Full Load+CDC modes, performance tuning, and DMS vs Debezium vs pglogical.

JusDB Team
April 17, 2023
12 min read
160 views

At 2:47 AM on a Tuesday, a Series C fintech's lead engineer opened a Slack incident channel: their primary RDS PostgreSQL 14 instance had just failed, and the read replica had not yet been promoted. Twenty-two minutes later, after frantic manual intervention, writes were back. The post-mortem revealed the real cost: not just 22 minutes of downtime, but $340,000 in delayed payment processing, a compliance flag from their banking partner, and a mandatory architectural review. Their PostgreSQL instance was on a single-AZ RDS deployment with a manually managed standby. Three weeks later, they were on Amazon Aurora PostgreSQL. The next failure — a host replacement triggered by AWS six months later — lasted 28 seconds. Their on-call engineer slept through it.

This post covers what makes Aurora PostgreSQL architecturally different from standard RDS PostgreSQL, which features matter in production, how to configure it correctly, and when Aurora's complexity and cost premium are genuinely justified.

TL;DR
  • Aurora PostgreSQL replaces the traditional storage layer with a distributed, self-healing volume replicated six ways across three Availability Zones — writes go to storage directly, not through a single-node I/O path.
  • Failover to an Aurora Replica takes approximately 30 seconds; failover on standard RDS PostgreSQL typically takes 1–3 minutes because a new host must be provisioned and WAL replayed.
  • Aurora Serverless v2 scales compute in fine-grained Aurora Capacity Units (ACUs) within seconds, making it cost-effective for workloads with spiky or unpredictable traffic patterns.
  • Aurora Global Database enables cross-region replication with sub-second lag, supporting active-passive disaster recovery or low-latency reads in a second region.
  • Aurora-specific views like aurora_stat_activity and aurora_global_db_instance_status are essential for operational monitoring and should be part of every runbook.
  • Aurora PostgreSQL costs roughly 20% more per vCPU-hour than equivalent RDS PostgreSQL, but the storage cost model (pay per GB consumed, not provisioned) often makes the total bill lower for most workloads.

Background: Why Aurora Exists

Amazon RDS PostgreSQL is managed PostgreSQL running on a conventional architecture: a primary EC2 instance writes to EBS storage, and a standby replica is kept in sync via streaming WAL replication. This model works well, but it has a fundamental constraint: the I/O path runs through a single host, and storage is tied to a single EBS volume per instance. Network I/O between compute and storage, EBS throughput limits, and the latency of WAL replication to a standby all compound at scale.

Aurora was Amazon's answer to these constraints. It decouples compute from storage entirely, replacing the EBS-backed model with a purpose-built, distributed storage service. From a PostgreSQL client's perspective, the database speaks standard PostgreSQL wire protocol and accepts standard SQL. Under the hood, almost everything about how writes reach durable storage is different.

Tip

Aurora PostgreSQL is compatible with PostgreSQL 14, 15, and 16 as of 2026. Extensions available on Aurora are a subset of what is available on self-managed PostgreSQL — always verify extension availability in the Aurora PostgreSQL extension list before migrating.

Aurora Architecture: How the Storage Layer Works

Six-Way Replication Across Three AZs

When a write is committed on Aurora, the primary instance does not write to a local disk. Instead, it sends log records to Aurora's distributed storage fleet. That fleet writes the data to six storage nodes — two in each of three Availability Zones — and acknowledges the write to the primary once four out of six nodes confirm receipt. This means Aurora can survive the simultaneous loss of an entire AZ plus one additional storage node and still complete writes without data loss.

The storage layer is also self-healing. Storage nodes continuously gossip with each other. If a node detects that a peer is behind, it requests the missing log segments and repairs itself in the background. The primary instance never needs to re-send the data; the storage fleet handles its own consistency. This is a fundamentally different durability model than streaming WAL replication, where the primary must retransmit to a replica that fell behind.

Write Path: No Local I/O

On standard PostgreSQL, a write follows this path: WAL write to disk on the primary → fsync → WAL shipped to replica → replica applies WAL → replica fsyncs. On Aurora, the primary writes log records to the distributed storage layer and receives the quorum acknowledgment. There is no local WAL fsync on the primary. The primary does not maintain a local data directory in the traditional sense — it reads pages on demand from the storage layer and caches them in shared_buffers.

This architecture has a concrete performance implication: Aurora write latency is lower than equivalent RDS PostgreSQL at high I/O rates because the write I/O path is entirely network-based to a purpose-optimized storage fleet, not bottlenecked by a single EBS volume's IOPS limit.

Storage Autoscaling

Aurora storage grows automatically in 10 GB increments, up to 128 TiB. There is no need to provision storage capacity in advance or modify the instance to resize the volume. Storage is billed per GB consumed (not allocated), which means a database that uses 40 GB pays for 40 GB — not the 500 GB you might pre-provision on RDS to avoid a resize operation during peak traffic.

Warning

Aurora storage never automatically shrinks. If you delete large amounts of data, the Aurora volume retains the high-water-mark allocation. To recover storage billing, you must perform a logical dump and restore into a new Aurora cluster. Plan data lifecycle and archiving strategies before reaching multi-terabyte scale.

Aurora Reader Endpoints

An Aurora cluster exposes two built-in endpoints: a writer endpoint (always routes to the primary) and a reader endpoint (load-balances across all available Aurora Replicas). Aurora Replicas receive page updates from the shared storage layer with typical replica lag of single-digit milliseconds — significantly lower than streaming WAL replication on standard RDS, where replica lag depends on WAL volume and network throughput.

Reader Replicas on Aurora can be promoted to primary in approximately 30 seconds. During promotion, the new primary simply starts accepting writes to the same shared storage volume — it does not need to replay a backlog of WAL, because all replicas are already reading from the same storage layer. This is the architectural basis for Aurora's fast failover.

Key Features Worth Understanding

Aurora Global Database

Aurora Global Database adds a second level of replication: from the primary AWS region to one or more secondary regions. The secondary region receives storage-level redo log updates, bypassing the compute layer entirely. Typical replication lag is under one second even to geographically distant regions.

Common use cases: active-passive disaster recovery with sub-second RPO, compliance requirements mandating cross-region data residency, and serving low-latency reads to users in a second continent. In a failover scenario, the secondary region can be promoted to primary in under a minute.

sql
-- Check replication lag across Global Database instances
-- Run on the primary cluster
SELECT
    server_id,
    session_id,
    aws_region,
    durable_lsn,
    highest_lsn_rcvd,
    feedback_epoch,
    log_delay_duration
FROM aurora_global_db_instance_status();

Aurora Serverless v2

Aurora Serverless v2 replaces the fixed instance class model with Aurora Capacity Units (ACUs). One ACU provides approximately 2 GB of memory and proportional CPU. You set a minimum and maximum ACU range, and Aurora scales the compute tier within that range within seconds in response to load.

Serverless v2 is not the same as the original Aurora Serverless v1, which had multi-second scale-up latency and could not support read replicas or Global Database. Serverless v2 supports all standard Aurora features including reader endpoints, Global Database, and Aurora Parallel Query.

Tip

Serverless v2 minimum ACU of 0.5 is suitable for dev/test environments — it keeps the cluster warm at minimal cost. For production, set a minimum of at least 2 ACUs to avoid latency spikes on the first queries after a quiet period.

# Example Aurora Serverless v2 scaling range via AWS CLI
aws rds modify-db-cluster \
  --db-cluster-identifier my-aurora-cluster \
  --serverless-v2-scaling-configuration MinCapacity=2,MaxCapacity=64 \
  --apply-immediately

Aurora Parallel Query

Aurora Parallel Query pushes analytical query processing down to the storage layer, allowing the storage nodes to filter and aggregate data before returning results to the compute tier. For large sequential scans — common in reporting workloads — Parallel Query can reduce query time by 100x on datasets that do not fit in shared_buffers. It is enabled per-session or per-query through the aurora_parallel_query parameter.

sql
-- Enable Parallel Query for a session
SET aurora_parallel_query = ON;

-- Verify it is being used in a query plan
EXPLAIN SELECT region, SUM(order_total)
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY region;
-- Look for "Parallel query (4 columns) Pushed down" in the plan output
Warning

Aurora Parallel Query has compatibility restrictions. It does not activate for queries involving temporary tables, certain join types, or tables using some storage engines. Always verify with EXPLAIN that the pushdown is actually occurring — the optimizer can silently fall back to the standard execution path.

Configuration Best Practices

Parameter Groups: Aurora-Specific Settings

Aurora uses a two-level parameter group system: cluster parameter groups (applied to all instances in the cluster) and instance parameter groups (applied per instance). Aurora-specific parameters only appear in Aurora parameter groups and are not available on standard RDS PostgreSQL.

# Key Aurora PostgreSQL cluster parameter group settings

# Enable Parallel Query (set to 1 to allow it; still requires SET per-session)
aurora_parallel_query = 1

# Control the maximum number of log records sent per write I/O operation
# Default is fine for most workloads; tune only under AWS guidance
aurora_wal_milliseconds = 200

# Enable Aurora machine learning integration (if using Bedrock/SageMaker)
aws_default_comprehend_role = arn:aws:iam::123456789:role/AuroraMLRole

Connection Pooling with RDS Proxy

Aurora PostgreSQL connections are expensive to establish — each connection creates a backend process. For serverless applications, Lambda functions, or microservices that open many short-lived connections, RDS Proxy sits between the application and Aurora and multiplexes client connections to a smaller pool of backend connections. This prevents the connection storm problem that is common when Lambda functions scale from zero to hundreds of concurrent executions.

# RDS Proxy typical configuration via AWS console or CloudFormation
# Target: Aurora PostgreSQL cluster endpoint
# Idle client connection timeout: 1800 seconds
# Connection borrow timeout: 120 seconds
# Max connections percent: 100  (proxy manages the pool; set per workload)
# InitQuery: SET search_path = myapp, public;

Monitoring Aurora-Specific Activity

Aurora adds views that expose cluster-level information not available in standard PostgreSQL system catalogs.

sql
-- Enhanced activity view (includes Aurora replication state)
SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    wait_event_type,
    wait_event,
    query_start,
    now() - query_start AS query_age,
    LEFT(query, 120) AS query_preview
FROM aurora_stat_activity()
WHERE state != 'idle'
ORDER BY query_start;

-- Storage-level I/O statistics per table
SELECT
    relname,
    heap_blks_read,
    heap_blks_hit,
    ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS cache_hit_pct
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC
LIMIT 20;
Important

aurora_stat_activity() is a function, not a view, and requires explicit parentheses. On Aurora, it returns additional columns not present in pg_stat_activity, including internal Aurora session metadata. Do not substitute one for the other in runbooks — the column sets differ between Aurora versions and standard PostgreSQL.

Backup and Point-in-Time Recovery

Aurora continuously backs up to S3 at the storage layer. There is no I/O penalty for enabling automated backups, unlike RDS PostgreSQL where a backup window can cause measurable I/O contention. The default backup retention is 1 day; increase it to 7–35 days for production workloads that need extended PITR windows.

bash
# Restore Aurora cluster to a point in time (AWS CLI)
aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier prod-aurora-cluster \
  --db-cluster-identifier prod-aurora-restored-20260120 \
  --restore-to-time 2026-01-20T03:00:00Z \
  --restore-type full-copy \
  --vpc-security-group-ids sg-0abc123def456 \
  --db-subnet-group-name aurora-subnet-group

Aurora vs RDS PostgreSQL vs Self-Managed EC2

The right deployment option depends on your team's operational capacity, traffic patterns, budget, and availability requirements. Here is an honest comparison:

Criteria Aurora PostgreSQL RDS PostgreSQL Self-Managed EC2
Failover time ~30 seconds (replica promotion) 1–3 minutes (new host + WAL replay) Manual or custom automation; typically 5+ minutes
Storage durability model 6-way replication, 3 AZs, self-healing EBS Multi-AZ mirroring (single volume) Depends entirely on your EBS/RAID configuration
Storage billing Per GB consumed; autoscales to 128 TiB Per GB provisioned; must manually resize Per GB provisioned EBS
Write throughput Higher; no local fsync bottleneck Limited by EBS IOPS provisioning Configurable; depends on instance and EBS type
Read scaling Up to 15 Aurora Replicas; reader endpoint; sub-ms replica lag Up to 5 read replicas; standard WAL replication lag Unlimited replicas; full operational responsibility
Serverless compute Aurora Serverless v2 (ACU-based, per-second billing) Not available Not applicable
Cross-region DR Aurora Global Database (<1s lag, <1min RTO) Cross-region read replica (WAL-based, higher lag) Custom solution required
Compute cost vs RDS ~20% higher per instance-hour Baseline Lower instance cost; higher operational cost
Extension support Subset of PostgreSQL extensions Broader subset; slightly more than Aurora All extensions; full superuser access
Best fit Production workloads needing HA, fast failover, autoscaling storage Cost-sensitive workloads; standard HA requirements Workloads needing full OS/Postgres control or unsupported extensions
Tip

For teams migrating from RDS PostgreSQL to Aurora, AWS Database Migration Service supports homogeneous migrations (PostgreSQL to Aurora PostgreSQL) with minimal downtime using logical replication. For most standard workloads, a pg_dump / pg_restore into a new Aurora cluster followed by a brief cutover window is simpler and less risky than DMS for databases under 500 GB.

Key Takeaways

Key Takeaways
  • Choose Aurora PostgreSQL when fast automated failover (under 30 seconds), sub-second read replica lag, or storage autoscaling are hard requirements — these advantages are structural, not configuration-dependent.
  • Use Aurora Serverless v2 for workloads with unpredictable or spiky traffic; set a minimum ACU of 2 or higher in production to avoid cold-start latency.
  • Monitor replication health across Global Database instances with aurora_global_db_instance_status() and integrate it into your alerting pipeline — do not wait for a DR event to discover replication lag.
  • Enable RDS Proxy for any application tier that creates many short-lived connections (Lambda, containerized microservices) — it prevents connection exhaustion during traffic spikes without Aurora configuration changes.
  • Test Parallel Query pushdown with EXPLAIN for every analytical query you plan to optimize — the optimizer does not always push queries down, and verifying actual behavior is essential before assuming the performance benefit.
  • Account for storage high-water-mark behavior in your data lifecycle planning: Aurora storage never shrinks, so archive or delete data before it accumulates to sizes that are expensive to retain long-term.

Working with JusDB on Aurora PostgreSQL

JusDB manages Aurora PostgreSQL for engineering teams that need production-grade reliability without the operational overhead of owning the cluster themselves. Our DBAs handle initial cluster design, parameter group tuning, Aurora Serverless v2 capacity planning, Global Database setup, RDS Proxy configuration, and 24/7 incident response — so your team ships features instead of spending weekends reading Aurora runbooks.

We have helped fintech teams migrate from multi-minute RDS failovers to sub-30-second Aurora promotions, configured Parallel Query for analytics workloads that were saturating read replicas, and designed Global Database topologies that meet cross-region RPO requirements for SOC 2 Type II compliance reviews.

Explore JusDB PostgreSQL Services →  |  Talk to a DBA about Aurora PostgreSQL

Related reading:

Share this article