Cloud Databases

Aurora PostgreSQL: Architecture, Best Practices, and When It Beats RDS

A production guide to Amazon Aurora PostgreSQL — the shared storage architecture, fast failover mechanics, Aurora Serverless v2 ACU scaling, Global Database, Parallel Query, and an honest comparison against standard RDS.

JusDB Team
February 7, 2023
13 min read
149 views

Last year, an e-commerce company running Oracle 19c on-premises reached out to JusDB three weeks before a planned AWS migration. Their DBA team had stood up an AWS DMS replication instance, configured source and target endpoints, and launched a Full Load + CDC task — only to watch it stall at 14% with foreign key constraint violations, then restart from scratch twice more before the maintenance window closed. The migration was a 900GB OLTP schema with 340 tables, dozens of XMLTYPE columns, and a binlog retention policy of 12 hours that expired mid-load. They had a hard deadline: the data center lease ended in 47 days.

AWS Database Migration Service is powerful, but it surfaces its complexity precisely when you can least afford it — during a live cutover window. This guide covers every component of DMS in production depth: replication instance sizing, endpoint configuration, Full Load vs CDC behavior, LOB handling, parallel load tuning, transformation rules, and the failure modes that catch teams off guard. It also compares DMS against Debezium and pglogical for teams evaluating open-source CDC alternatives.

TL;DR
  • AWS DMS has three core components: a replication instance (compute), source/target endpoints (connection config), and a migration task (the actual job).
  • Full Load + CDC is the standard pattern for near-zero-downtime migrations; CDC-only is used for ongoing replication after a manual bulk load.
  • LOB columns (BLOB, CLOB, XMLTYPE) require Full LOB mode or carefully tuned Limited LOB mode — the wrong setting causes silent data truncation.
  • Parallel load settings and table-mapping rules are the primary performance levers; a single-threaded task on a 500GB schema will take 30+ hours.
  • Foreign key constraints on the target must be disabled during Full Load or the task will fail on rows that arrive before their referenced parents.
  • DMS works best for heterogeneous migrations (Oracle → Aurora PostgreSQL, SQL Server → MySQL); for homogeneous migrations, pglogical or native replication are usually faster and more reliable.

Background: What AWS DMS Is and When to Use It

AWS Database Migration Service is a fully managed replication service that moves data between a source database and a target database, handling schema conversion, type mapping, and change capture under the hood. It was built primarily to accelerate migrations off commercial databases (Oracle, SQL Server, IBM Db2) onto AWS-managed engines like Aurora, RDS PostgreSQL, RDS MySQL, and Redshift.

DMS supports a wide source and target matrix. On the source side: Oracle, SQL Server, MySQL, MariaDB, PostgreSQL, MongoDB, SAP ASE (Sybase), IBM Db2, Azure SQL, and several others. On the target side: all of the above plus Amazon Redshift, Amazon S3, Amazon Kinesis, Amazon Kafka (MSK), OpenSearch, DynamoDB, and DocumentDB. The engine-specific behavior differs significantly — Oracle CDC uses LogMiner or Binary Reader; MySQL CDC reads the binary log; PostgreSQL CDC requires logical replication slots; MongoDB CDC uses the oplog.

DMS is the right tool when you need a managed, GUI-driven migration with built-in monitoring, CloudWatch metrics, and AWS Support backing. It is not the right tool when you need sub-second replication latency for operational use cases, or when you need complex transformation logic that exceeds what DMS table-mapping rules can express.

Tip

For homogeneous migrations — MySQL to MySQL, PostgreSQL to PostgreSQL — consider native tools first: mysqldump + binary log replay, or pglogical for PostgreSQL-to-PostgreSQL logical replication. DMS adds cost and complexity that is only justified when you are crossing engine boundaries or need the managed orchestration layer.


DMS Architecture

Replication Instance

The replication instance is an EC2-backed managed server that DMS provisions inside your VPC. It connects to your source endpoint, reads data, applies transformations, and writes to your target endpoint. Sizing it correctly is the most important performance decision you will make before starting a task.

Instance classes follow standard EC2 naming: dms.t3.medium, dms.r5.xlarge, dms.r5.4xlarge, etc. The r5 family (memory-optimized) is almost always the right choice for production migrations — DMS buffers LOB data and CDC events in memory, and running out of memory during a Full Load on a t3 causes task restarts that waste hours.

Rule of thumb for sizing: start with an instance whose RAM is at least 10% of your total dataset size. For a 500GB schema, a dms.r5.2xlarge (64 GB RAM) is a reasonable baseline. Enable Multi-AZ on the replication instance for any production migration where you cannot afford the task to restart due to instance failure mid-load.

bash
# Create a replication instance via AWS CLI
aws dms create-replication-instance \
  --replication-instance-identifier prod-migration-01 \
  --replication-instance-class dms.r5.2xlarge \
  --allocated-storage 200 \
  --vpc-security-group-ids sg-0abc123def456789 \
  --replication-subnet-group-identifier my-dms-subnet-group \
  --multi-az \
  --publicly-accessible false \
  --no-auto-minor-version-upgrade

Endpoints

An endpoint is a named connection configuration: engine type, host, port, credentials, and engine-specific extra connection attributes. Source and target endpoints are separate objects. You create them independently and then reference them from a task.

Engine-specific extra connection attributes (ECAs) are where most subtle bugs live. For Oracle sources, you must specify whether to use LogMiner or Binary Reader for CDC. For MySQL sources, you must ensure binlog_format=ROW and set the server ID. For PostgreSQL sources, you must pre-create a replication slot and specify it in the ECA.

bash
# Create a PostgreSQL source endpoint
aws dms create-endpoint \
  --endpoint-identifier pg-source-prod \
  --endpoint-type source \
  --engine-name postgres \
  --username dms_user \
  --password 'REPLACE_WITH_SECRET' \
  --server-name pg-prod.internal \
  --port 5432 \
  --database-name appdb \
  --extra-connection-attributes "pluginName=pglogical;slotName=dms_slot;heartbeatEnable=true;heartbeatFrequency=5"
bash
# Create an Aurora MySQL target endpoint
aws dms create-endpoint \
  --endpoint-identifier aurora-mysql-target \
  --endpoint-type target \
  --engine-name aurora \
  --username dms_target_user \
  --password 'REPLACE_WITH_SECRET' \
  --server-name aurora-cluster.cluster-xxxxx.us-east-1.rds.amazonaws.com \
  --port 3306 \
  --database-name appdb

Migration Tasks

A task ties together a replication instance, a source endpoint, and a target endpoint, and defines what to migrate and how. Tasks have three migration types: Full Load only, CDC only, and Full Load + CDC. Within a task you define table-mapping rules (which schemas and tables to include or exclude) and optional transformation rules (renaming, filtering, column expressions).

Warning

A task's LOB settings, parallel load configuration, and table-mapping rules cannot be changed while the task is running. Stopping a Full Load task and restarting it from a table-level checkpoint is possible, but stopping and reconfiguring requires a full restart. Plan your task settings before launch.


Full Load vs CDC

Full Load Only

Full Load reads all rows from selected source tables and writes them to the target in bulk. DMS uses parallel worker threads to read from multiple tables simultaneously. During Full Load, the target table is written without constraint checking (if you configure DMS to disable them) and DMS does not track source changes — any INSERTs, UPDATEs, or DELETEs happening on the source during the load are lost. Full Load only is appropriate for one-time bulk loads where the source goes read-only during the migration, or for initial seeding before a separate CDC mechanism takes over.

Full Load + CDC

This is the standard pattern for near-zero-downtime migrations. DMS first performs a Full Load of all selected tables. While the Full Load is running, DMS simultaneously captures change events from the source (via LogMiner, binary log, WAL, or oplog) and buffers them. Once Full Load completes for all tables, DMS applies the buffered changes to the target in order. The result is a target that is caught up to the source at the point Full Load completed, plus any changes that occurred during the load. Your application cutover window is reduced to the CDC replication lag — typically seconds to a few minutes, not hours.

CDC Only

CDC-only tasks replicate changes from the source to the target continuously without performing an initial load. This mode is used when you have already loaded data via another mechanism (native database export, Snowball, DMS Full Load in a prior run) and only need ongoing change replication. It is also used to establish a "warm standby" target before a cutover, or to feed CDC events into streaming targets like Kinesis or Kafka.

Important

When using Full Load + CDC on a MySQL source, the source's binary log must be retained long enough to cover the entire Full Load duration. If your binlog retention (expire_logs_days or binlog_expire_logs_seconds) is set to 24 hours and your Full Load takes 28 hours, DMS will error when it tries to apply CDC events from log files that have already been purged. Set binlog retention to at least 3× your expected Full Load duration before starting the migration.


Setting Up a Migration Task

Step 1: Prepare the Source Database

For MySQL/MariaDB: set binlog_format = ROW and binlog_row_image = FULL in my.cnf. Create a DMS user with REPLICATION SLAVE, REPLICATION CLIENT, and SELECT privileges. Increase binlog retention to cover the migration window.

# my.cnf additions for DMS source
[mysqld]
binlog_format           = ROW
binlog_row_image        = FULL
binlog_expire_logs_seconds = 604800   # 7 days
log_bin                 = mysql-bin
server_id               = 1
bash
# Grant DMS user privileges (MySQL)
mysql -u root -p <<'SQL'
CREATE USER 'dms_user'@'%' IDENTIFIED BY 'StrongPassword123!';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'dms_user'@'%';
FLUSH PRIVILEGES;
SQL

For PostgreSQL: enable logical replication (wal_level = logical, max_replication_slots = 5, max_wal_senders = 5). Create a replication slot before creating the DMS endpoint.

bash
# Create logical replication slot on PostgreSQL source
psql -U postgres -d appdb -c \
  "SELECT pg_create_logical_replication_slot('dms_slot', 'pglogical');"

Step 2: Prepare the Target Database

Disable foreign key checks and triggers on the target for the duration of Full Load. On Aurora MySQL:

bash
# Disable FK checks for DMS load (Aurora MySQL parameter group)
# Set in a custom parameter group, NOT session-level, so DMS connections inherit it
aws rds modify-db-cluster-parameter-group \
  --db-cluster-parameter-group-name aurora-dms-target-params \
  --parameters "ParameterName=foreign_key_checks,ParameterValue=0,ApplyMethod=immediate"

For Aurora PostgreSQL targets, disable triggers on all target tables:

-- Run on target Aurora PostgreSQL before starting Full Load
SET session_replication_role = 'replica';
-- This disables FK checks and triggers for the current session
-- DMS connections will need this set via endpoint extra connection attributes

Step 3: Define Table-Mapping Rules

Table-mapping rules use a JSON document to select which schemas and tables to migrate, and optionally rename or transform them. The rules are set in the task configuration.

{
  "rules": [
    {
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "include-appdb",
      "object-locator": {
        "schema-name": "appdb",
        "table-name": "%"
      },
      "rule-action": "include",
      "filters": []
    },
    {
      "rule-type": "selection",
      "rule-id": "2",
      "rule-name": "exclude-audit-logs",
      "object-locator": {
        "schema-name": "appdb",
        "table-name": "audit_%"
      },
      "rule-action": "exclude"
    },
    {
      "rule-type": "transformation",
      "rule-id": "3",
      "rule-name": "lowercase-schema",
      "rule-action": "convert-lowercase",
      "rule-target": "schema",
      "object-locator": {
        "schema-name": "%"
      }
    }
  ]
}

Step 4: Configure LOB Handling

LOB columns (BLOB, CLOB, NCLOB, XMLTYPE, TEXT, IMAGE) require special handling in DMS. There are two modes: Limited LOB mode and Full LOB mode.

Limited LOB mode uses an inline read path — LOB data up to the configured max size is read in the same SELECT as the row. Set LobMaxSize in kilobytes. Any LOB exceeding this size is truncated silently. Use this when your LOBs are small and uniform in size (e.g., product images under 1MB).

Full LOB mode reads LOB data in a separate pass after the row is written. This is slower but handles LOBs of any size correctly. Use this when LOB sizes are variable or you have any CLOB/XMLTYPE columns.

Warning

Limited LOB mode with an undersized LobMaxSize will truncate LOB data without raising an error. Always validate a sample of LOB columns on the target after Full Load if you are using Limited LOB mode. Set LobMaxSize to the 99th percentile of your actual LOB column sizes, not the average.

bash
# Create task with Full LOB mode enabled
aws dms create-replication-task \
  --replication-task-identifier app-migration-task \
  --source-endpoint-arn arn:aws:dms:us-east-1:123456789:endpoint:pg-source-prod \
  --target-endpoint-arn arn:aws:dms:us-east-1:123456789:endpoint:aurora-mysql-target \
  --replication-instance-arn arn:aws:dms:us-east-1:123456789:rep:prod-migration-01 \
  --migration-type full-load-and-cdc \
  --table-mappings file://table-mappings.json \
  --replication-task-settings '{
    "TargetMetadata": {
      "TargetSchema": "",
      "SupportLobs": true,
      "FullLobMode": true,
      "LobChunkSize": 64
    },
    "FullLoadSettings": {
      "TargetTablePrepMode": "DO_NOTHING",
      "CreatePkAfterFullLoad": false,
      "StopTaskCachedChangesApplied": false,
      "StopTaskCachedChangesNotApplied": false,
      "MaxFullLoadSubTasks": 8,
      "TransactionConsistencyTimeout": 600,
      "CommitRate": 50000
    },
    "Logging": {
      "EnableLogging": true,
      "LogComponents": [
        {"Id": "SOURCE_UNLOAD", "Severity": "LOGGER_SEVERITY_DEFAULT"},
        {"Id": "TARGET_LOAD", "Severity": "LOGGER_SEVERITY_DEFAULT"},
        {"Id": "TASK_MANAGER", "Severity": "LOGGER_SEVERITY_DEFAULT"}
      ]
    }
  }'

Performance Tuning

Parallel Load Settings

MaxFullLoadSubTasks controls how many tables DMS loads in parallel. The default is 8. For a schema with hundreds of small tables, push this to 32 or 49 (the maximum). For a schema with a few very large tables, keep it lower and instead use table segmentation to parallelize within a single large table.

Table segmentation splits a large table into ranges and loads each range with a separate sub-task. This is the single most impactful tuning lever for large-table migrations.

{
  "rules": [
    {
      "rule-type": "table-settings",
      "rule-id": "10",
      "rule-name": "parallel-load-orders",
      "object-locator": {
        "schema-name": "appdb",
        "table-name": "orders"
      },
      "parallel-load": {
        "type": "ranges",
        "columns": ["order_id"],
        "boundaries": [
          ["1000000"],
          ["5000000"],
          ["10000000"],
          ["20000000"]
        ]
      }
    }
  ]
}

Commit Rate and Buffer Tuning

The CommitRate setting (default 50,000) controls how many rows DMS commits to the target per transaction during Full Load. Higher values reduce commit overhead but increase memory pressure on the replication instance. For targets with fast SSDs (Aurora, RDS io1), increase to 100,000–500,000. For targets with slower storage, keep at 50,000.

Tip

Monitor the DMS CloudWatch metric CDCLatencySource during the CDC phase. If this metric grows continuously rather than staying near zero, your replication instance is not keeping up with source change volume. Scale up the instance class or reduce the number of concurrent Full Load sub-tasks to free CPU for the CDC thread.

Common Failures and Fixes

Foreign key constraint violations during Full Load: DMS loads tables in parallel without respecting FK relationships. A child row can arrive before its parent. Fix: disable FK checks on the target before the task starts (see Step 2 above), re-enable after Full Load completes.

Unsupported data types: Oracle XMLTYPE, INTERVAL, RAW, SQL Server hierarchyid, geography, and geometry are not natively supported. DMS will skip these columns or error, depending on settings. Use the AWS Schema Conversion Tool (SCT) to pre-convert unsupported types before running the DMS task.

Replication slot bloat on PostgreSQL sources: If the DMS task pauses for any reason, the PostgreSQL replication slot continues to accumulate WAL. A paused slot for 6+ hours on a busy source can consume hundreds of gigabytes of disk. Set max_slot_wal_keep_size in PostgreSQL 13+ to cap slot WAL retention, and monitor pg_replication_slots for confirmed_flush_lsn lag.

bash
# Check replication slot lag on PostgreSQL source
psql -U postgres -d appdb -c "
SELECT slot_name,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS wal_lag,
       active
FROM pg_replication_slots
WHERE slot_name = 'dms_slot';"

DMS vs Debezium vs pglogical for CDC

For teams evaluating CDC options beyond DMS, the comparison comes down to managed convenience vs. operational flexibility vs. latency.

Criteria AWS DMS Debezium pglogical
Source support Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, and more MySQL, PostgreSQL, Oracle, SQL Server, MongoDB, Cassandra, and more via community connectors PostgreSQL only
Target support RDS, Aurora, Redshift, S3, Kinesis, MSK, DynamoDB, OpenSearch Any Kafka consumer downstream (Kafka Connect sink connectors); direct JDBC targets PostgreSQL only
Replication latency Seconds to minutes; higher under load Sub-second to seconds; near-real-time Kafka delivery Sub-second; uses native PostgreSQL WAL streaming
Infrastructure overhead Fully managed; no infrastructure to operate Requires Kafka cluster + Kafka Connect workers; significant operational burden Extension-level; minimal overhead if already on PostgreSQL
Schema evolution Limited; DDL changes often require task restart Handles DDL propagation with schema registry; best-in-class Limited DDL replication; column additions are manual
Cost model Per-instance-hour + data transfer; $0.18/hr for dms.r5.large Open source; cost is the Kafka cluster and EC2/EKS for Connect workers Open source; no additional cost if already on PostgreSQL
Best for Heterogeneous one-time migrations; teams wanting managed orchestration Event streaming architectures; real-time data pipelines; multi-consumer CDC PostgreSQL-to-PostgreSQL logical replication; version upgrades with minimal downtime

The migration company from the opening scenario ultimately used DMS for the Oracle-to-Aurora PostgreSQL Full Load, then switched to Debezium on Kafka for ongoing CDC after cutover — DMS for the migration, Debezium for the operational data pipeline. This is a common and effective pattern.

Tip

If your target is Kafka rather than a database, Debezium almost always wins over DMS. DMS's Kinesis and MSK targets work, but the event format is less rich than Debezium's envelope format, and schema registry integration requires custom work. See our deep-dive on Debezium CDC for MySQL and PostgreSQL into Kafka.


Key Takeaways

Key Takeaways
  • Size the replication instance at memory ≥10% of your dataset; use dms.r5 class with Multi-AZ enabled for any production migration.
  • Set binlog retention (MySQL) or WAL retention (PostgreSQL) to at least 3× your expected Full Load duration before starting a Full Load + CDC task.
  • Disable foreign key constraints and triggers on the target before Full Load begins; re-enable them after Full Load completes, not after CDC starts.
  • Use Full LOB mode for any schema containing BLOB, CLOB, or XMLTYPE columns; validate LOB column data post-load before declaring the migration complete.
  • Apply table segmentation rules on large tables (>50GB) to parallelize the load within a single table; MaxFullLoadSubTasks alone only parallelizes across tables.
  • Choose Debezium over DMS for event-streaming targets or when sub-second CDC latency is required; choose pglogical for PostgreSQL-to-PostgreSQL homogeneous migrations.

Working with JusDB on Database Migration

JusDB manages end-to-end database migrations for engineering teams who need to move fast without downtime risk. Our DBAs have run Oracle-to-Aurora, SQL Server-to-PostgreSQL, MySQL-to-Aurora, and MongoDB-to-DocumentDB migrations across production schemas ranging from 50GB to 12TB. We handle DMS task configuration, SCT schema conversion, binlog and WAL retention setup, cutover window planning, and post-migration validation — including LOB integrity checks and CDC lag monitoring through the transition period.

We also implement Debezium and pglogical for teams who need ongoing CDC pipelines after migration, not just a one-time bulk move.

Explore JusDB Database Migration Services → Talk to a Migration DBA

Related reading:

Share this article