Cloud Databases

RDS Multi-AZ vs Read Replicas: HA and Scaling Patterns

Understand RDS Multi-AZ for HA and Read Replicas for read scaling — failover behavior, replication lag, and costs

JusDB Team
January 21, 2026
11 min read
203 views

When your RDS instance goes down at 2 AM, the difference between a two-minute outage and a two-hour war room comes down to one architectural decision you made weeks earlier. AWS gives you two distinct mechanisms — Multi-AZ deployments and Read Replicas — that are frequently confused for each other, yet serve fundamentally different purposes. Engineers who treat them as interchangeable end up with databases that are either over-engineered and expensive, or one hardware failure away from extended downtime. Understanding exactly what each feature does, and how they interact, is the foundation of a production-grade RDS strategy.

TL;DR
  • Multi-AZ uses synchronous replication to a standby in a second AZ — it exists purely for high availability and automatic failover (typically under 2 minutes), not for read scaling.
  • Read Replicas use asynchronous replication and can lag behind the primary — they exist for read scaling, reporting workloads, and cross-region redundancy.
  • You can and often should use both together: Multi-AZ for HA, Read Replicas for offloading reads.
  • Multi-AZ roughly doubles your instance cost; Read Replicas add the cost of additional instance hours.
  • Aurora Multi-AZ works differently from standard RDS Multi-AZ — the storage layer is shared across all instances.

Multi-AZ Deployments Explained

When you enable Multi-AZ on an RDS instance, AWS provisions a standby replica in a different Availability Zone within the same AWS Region. Every write to your primary instance is synchronously replicated to this standby before the write is acknowledged to your application. This is not eventual consistency — your application does not receive an acknowledgment until both the primary and the standby have committed the transaction.

The practical consequence of synchronous replication is that the standby is always current. There is no replication lag. If the primary instance, its underlying host, or the AZ itself experiences a failure, RDS automatically promotes the standby to primary and updates the DNS record for your DB endpoint. This failover process typically completes in under 2 minutes, though the exact duration depends on the transaction load at the moment of failure and how quickly your application's connection pool recovers.

A critical point that catches engineers off guard: you cannot read from the Multi-AZ standby. It does not have a separate endpoint. It is a hot spare that accepts no connections while operating as a standby. All reads and writes go through the primary endpoint. If your goal is to reduce read load on your primary, Multi-AZ does nothing for you.

RDS Multi-AZ Cluster (MySQL 8 and PostgreSQL 13+)

AWS introduced a newer Multi-AZ option called the RDS Multi-AZ Cluster for MySQL 8.0.28+ and PostgreSQL 13+. Instead of one primary and one standby, this configuration provisions one writer and two readable standby instances across three AZs. The standbys use semi-synchronous replication and can serve read traffic, addressing the "no reads from standby" limitation of classic Multi-AZ. Failover in this configuration targets under 35 seconds. This is a meaningfully different product from classic Multi-AZ and is worth evaluating if you need both HA and read scaling without the full complexity of Aurora.

Aurora Multi-AZ vs. RDS Multi-AZ

Aurora's approach to Multi-AZ is architecturally distinct. Rather than replicating the database engine state between instances, Aurora uses a distributed, shared storage layer that spans three AZs with six copies of your data (two per AZ). Aurora Replicas read from the same underlying storage as the primary — there is no traditional replication. Failover in Aurora is typically under 30 seconds because the new primary does not need to replay a transaction log; it simply starts writing to the same shared storage volume. Aurora Replicas can also serve reads at near-zero replication lag, which blurs the line between HA standby and Read Replica in the Aurora model.


Read Replicas Explained

Read Replicas are separate RDS instances that receive changes from a source database through asynchronous replication. Unlike Multi-AZ standbys, each Read Replica has its own endpoint and can serve SELECT queries independently. You point your reporting tools, analytics dashboards, and read-heavy application components at Read Replica endpoints, leaving your primary instance to handle writes and latency-sensitive reads.

Because replication is asynchronous, Read Replicas can and do lag behind the primary. Under normal conditions the lag is milliseconds. Under heavy write load, after a large batch operation, or during a network event, lag can grow to seconds or even minutes. The ReplicaLag CloudWatch metric (in seconds) is your primary instrument for monitoring this. Set an alarm on it. If your application reads data that was just written and expects to see it immediately, reading from a lagging replica will produce stale results — a subtle bug that only surfaces under load.

Warning

Never route write-then-read sequences to a Read Replica without accounting for replication lag. A user who submits a form and is immediately redirected to a confirmation page reading from a lagging replica may see their own submission appear missing. Design your application to route post-write reads to the primary or implement read-your-writes consistency at the application layer.

RDS supports up to five Read Replicas per source instance for MySQL and MariaDB (up to 15 for Aurora). PostgreSQL also supports up to five. Read Replicas can themselves have Read Replicas (cascaded replication), though lag accumulates at each hop.

Connection String Strategy for Read Routing

Most PostgreSQL and MySQL drivers support a simple pattern for routing reads. You maintain two connection strings in your application configuration:

text
# Primary (writes + latency-sensitive reads)
DATABASE_URL=postgresql://user:pass@primary.cluster.region.rds.amazonaws.com:5432/mydb

# Read Replica (reporting, analytics, non-critical reads)
DATABASE_READ_URL=postgresql://user:pass@replica.cluster.region.rds.amazonaws.com:5432/mydb

For Aurora clusters, the reader endpoint automatically load-balances connections across all available Aurora Replicas, simplifying this pattern. For standard RDS, you manage replica endpoints manually or through a proxy layer like RDS Proxy.

Cross-Region Read Replicas

Read Replicas can be created in a different AWS Region from the source. This enables two scenarios: serving reads closer to geographically distributed users, and providing a warm standby in another Region for disaster recovery. Cross-region Read Replicas incur data transfer costs for the replication traffic across Regions, and they can be promoted to standalone primary instances in a DR event — a process that is manual and takes several minutes, unlike the automatic failover of Multi-AZ.

Promoting a Read Replica

Promoting a Read Replica breaks replication and turns it into an independent, writable RDS instance. This is used deliberately during DR failover, regional migration, or when you want to branch your database for a blue/green deployment. After promotion, the new instance is no longer a replica — it will not receive further changes from the original source and must be treated as a standalone primary. Promotion typically takes a few minutes; during this time the instance is unavailable.

Tip

Before promoting a Read Replica in a DR scenario, check the ReplicaLag metric. If lag is high, promoting immediately means your new primary is behind. In some cases it is worth waiting for lag to drain to zero before promoting, accepting a longer failover time in exchange for zero data loss.


Multi-AZ vs. Read Replicas: Side-by-Side Comparison

Attribute Multi-AZ Read Replica
Primary purpose High availability / automatic failover Read scaling / reporting / DR
Replication type Synchronous Asynchronous
Replication lag None (writes blocked until standby confirms) Variable — monitor ReplicaLag CloudWatch metric
Readable? No (classic Multi-AZ); Yes (Multi-AZ Cluster, Aurora Replicas) Yes — dedicated endpoint per replica
Automatic failover Yes — under 2 min (classic), ~35s (Multi-AZ Cluster), ~30s (Aurora) No — promotion is manual
Cross-region support No (same Region only) Yes
Use for DR? Within-Region AZ failure only Yes, cross-region via promotion
Cost ~2x instance cost (standby is same size) Per-instance cost for each replica
Engines All RDS engines + Aurora MySQL, MariaDB, PostgreSQL, Oracle, SQL Server (limited), Aurora

Combining Multi-AZ and Read Replicas

These two features are not mutually exclusive — in production environments they are typically deployed together. A common production architecture looks like this: your primary RDS instance runs with Multi-AZ enabled (ensuring automatic failover if the primary or its AZ fails), and one or more Read Replicas are created from the primary to absorb read traffic.

When Multi-AZ failover occurs, the standby is promoted to primary. Your Read Replicas automatically reattach to the new primary and resume replication. During the failover window, Read Replicas may fall briefly behind, but they will catch up once the new primary is established. Your read endpoint continues to function throughout this process, though reads from replicas during failover may be briefly stale.

For a reporting use case specifically: creating a dedicated Read Replica for your reporting team or BI tool is a best practice that pays for itself quickly. Reporting queries are often long-running full-table scans that compete directly with transactional workloads on the primary. Routing them to a replica — potentially a larger instance class tuned for analytics — keeps your primary responsive for application traffic.


Failover Behavior in Detail

When RDS initiates a Multi-AZ failover, the sequence is: the primary is taken out of service, the standby is promoted, and the DNS record for your primary endpoint is updated to point to the new primary. Because DNS changes are involved, your application must be using the RDS endpoint (not a hardcoded IP), and your database driver's connection timeout and retry settings matter enormously.

Applications that open a connection and hold it indefinitely will see that connection drop during failover. Applications with aggressive connection retry logic and short TCP keepalive intervals will reconnect within the 2-minute window. Applications that cache DNS aggressively may miss the endpoint update — ensure your JVM or connection pool respects TTL on DNS records (RDS endpoints use a 5-second TTL).

Warning

RDS Proxy can significantly improve failover behavior for applications with connection-intensive workloads. RDS Proxy maintains a connection pool to the database and handles reconnection transparently during failover events, reducing the effective outage observed by application instances to near zero.

You can also manually trigger a Multi-AZ failover from the RDS console or CLI using the reboot with failover option. This is useful for testing your application's failover handling before it matters in production. Run this drill during low-traffic periods and measure actual downtime observed by your monitoring stack.


Cost Considerations

The cost implications are straightforward but worth being explicit about. Multi-AZ deployments cost approximately twice the price of a single-AZ instance because AWS provisions a full standby instance of the same size. There is no sharing or discounting on the standby. Storage costs are also doubled because the standby maintains its own copy of your data.

Read Replicas cost the same as any other RDS instance of the same class. A single Read Replica adds one instance's worth of compute and storage. If you run two replicas, you pay for two additional instances. Cross-region replicas additionally incur inter-region data transfer fees on the replication stream, which can be meaningful at high write volumes.

The cost question is not "is Multi-AZ worth it?" for anything customer-facing — the answer is almost always yes. The real cost analysis is whether a Multi-AZ Cluster or Aurora provides better economics than classic Multi-AZ plus separate Read Replicas for your specific read-to-write ratio and HA requirements. For workloads with more than 5:1 read-to-write ratios, Aurora's shared storage model frequently wins on both cost and operational simplicity.


Key Takeaways

Key Takeaways
  • Multi-AZ is for availability, not performance. It provides automatic failover in under 2 minutes using synchronous replication. You cannot read from a classic Multi-AZ standby.
  • Read Replicas are for read scaling and reporting. They use asynchronous replication and can lag — monitor the ReplicaLag CloudWatch metric and set alarms.
  • The RDS Multi-AZ Cluster (MySQL 8, PostgreSQL 13+) gives you readable standbys with faster failover (~35s) — a strong middle ground before committing to Aurora.
  • Aurora's architecture is fundamentally different: shared storage across AZs means faster failover, near-zero replica lag, and up to 15 readable replicas.
  • Use both together: Multi-AZ on your primary for HA, plus Read Replicas to offload reporting queries and protect your primary's capacity for transactional workloads.
  • Cross-region Read Replicas can be promoted manually for DR, but this is not automatic — test your runbook before you need it.
  • Multi-AZ costs roughly 2x a single-AZ instance. Plan your architecture budget accordingly and evaluate whether Aurora's pricing model makes sense at scale.
  • Ensure your connection pool respects DNS TTL and implements retry logic — both are prerequisites for recovering cleanly within the Multi-AZ failover window.

Let JusDB Handle the Architecture Decisions

Choosing between Multi-AZ, Read Replicas, RDS Multi-AZ Clusters, and Aurora — and configuring them correctly for your workload — requires understanding your read/write ratios, latency requirements, RPO, and budget constraints simultaneously. It is the kind of decision where a misconfiguration costs you either in downtime or in an unnecessarily large cloud bill.

JusDB's database experts work with AWS RDS environments daily. We help engineering teams design HA architectures that match their actual reliability requirements, right-size their replica configurations, and build the monitoring and runbook scaffolding needed to recover fast when failures occur. Whether you are migrating to RDS, tuning an existing deployment, or building a DR strategy from scratch, we bring the operational experience to get it right the first time.

Talk to a JusDB engineer about your RDS architecture — we'll help you build something that actually holds up at 2 AM.

Share this article