DynamoDB Streams and Lambda

DynamoDB Streams turns your database into an event bus, firing an ordered sequence of item-level changes that downstream consumers can react to in near real time. At Netflix, Airbnb, and other high-scale shops, this capability powers everything from cache invalidation to cross-region replication without a single line of polling code. The combination of DynamoDB Streams and AWS Lambda is one of the most operationally lightweight event-driven patterns in the AWS ecosystem — no brokers to manage, no servers to provision. If you are building on DynamoDB and not yet leveraging Streams, you are leaving a powerful primitive on the table.

TL;DR

DynamoDB Streams captures a 24-hour, ordered log of every item-level change (INSERT, MODIFY, REMOVE) in a DynamoDB table.
Four view types control how much data appears in each stream record: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, and NEW_AND_OLD_IMAGES.
Lambda's event source mapping polls shards automatically — configure batch size, parallelization factor, and bisect-on-error to tune throughput and resilience.
For retention beyond 24 hours, switch to Kinesis Data Streams for DynamoDB (up to 365 days).
Idempotency is your responsibility: the same record can be delivered more than once, so design consumers defensively.

What Are DynamoDB Streams?

When you enable DynamoDB Streams on a table, DynamoDB begins writing a time-ordered, deduplicated log of every item change to a set of shards that mirror the table's partition structure. Each shard holds a sequence of stream records, and the records within a single shard are strictly ordered by time — guaranteeing that you always see the before and after state of an item in the correct sequence.

The stream is fully managed. You do not create shards, scale them, or worry about their lifecycle. DynamoDB splits and merges stream shards in tandem with the underlying table shards. The only operational constraint you must respect is the 24-hour retention window: stream records expire after 24 hours regardless of whether they have been consumed. If your consumer falls more than 24 hours behind — due to an outage, a runaway error loop, or an underprovisioned Lambda concurrency limit — those records are gone permanently.

Warning

Stream records are not guaranteed to be delivered exactly once. Under certain failure conditions (shard re-sharding, Lambda retry storms) the same record can appear multiple times. Design every consumer to be idempotent from day one — not as an afterthought.

Stream Record View Types

Before enabling Streams, you must choose a StreamViewType. This controls what data is included in each stream record and has a direct impact on your Lambda payload size, your data transfer costs, and what your consumer can actually do with the record.

KEYS_ONLY — Only the primary key attributes (partition key and, if present, sort key) are written to the stream. Use this when your consumer only needs to know which item changed, not what changed — for example, a cache invalidation function that re-fetches the item from the table anyway.
NEW_IMAGE — The entire item state after the change is included. Useful for downstream projections and read-model updates where you only care about the current state.
OLD_IMAGE — The entire item state before the change is included. Less common, but valuable for audit logging and undo workflows.
NEW_AND_OLD_IMAGES — Both before and after images are included in every record. This is the most powerful option and the only one that enables true change-data-capture (CDC) delta computation, event sourcing, and conflict detection in multi-writer patterns.

Tip

Start with NEW_AND_OLD_IMAGES during development. You can always downgrade to a narrower view type in production to reduce payload size and Lambda invocation duration, but you cannot retroactively add the old image to records that were already emitted.

The view type is set at the table level and applies to all consumers of that stream. If different consumers need different views, they must share the same stream — you cannot create per-consumer view configurations. This is a common source of confusion when onboarding new use cases to an existing table.

Lambda Integration via Event Source Mapping

Lambda integrates with DynamoDB Streams through an event source mapping (ESM) — a Lambda-managed polling layer that reads batches of records from each shard and invokes your function synchronously. You do not write any polling code; you configure the ESM and Lambda handles the rest.

Key ESM parameters to understand:

json

{
  "EventSourceArn": "arn:aws:dynamodb:us-east-1:123456789012:table/Orders/stream/2024-01-15T00:00:00.000",
  "StartingPosition": "TRIM_HORIZON",
  "BatchSize": 100,
  "MaximumBatchingWindowInSeconds": 5,
  "ParallelizationFactor": 4,
  "BisectBatchOnFunctionError": true,
  "DestinationConfig": {
    "OnFailure": {
      "Destination": "arn:aws:sqs:us-east-1:123456789012:dynamodb-stream-dlq"
    }
  },
  "MaximumRetryAttempts": 3,
  "MaximumRecordAgeInSeconds": 3600
}

BatchSize (1–10,000, default 100) controls how many records are delivered per Lambda invocation. Larger batches improve throughput but increase the blast radius of a failure — if one record in a batch causes an error, the entire batch is retried by default.

MaximumBatchingWindowInSeconds (0–300) introduces an intentional delay so Lambda can accumulate a fuller batch before invoking. Set this to a non-zero value only when your workload is bursty and you want to reduce invocation count at the cost of slightly higher latency.

ParallelizationFactor (1–10) is one of the most impactful settings for high-throughput tables. By default, Lambda processes each shard with a single concurrent invocation, which caps throughput at one batch per shard per polling cycle. Increasing this factor allows Lambda to run multiple concurrent invocations per shard, dramatically increasing fan-out capacity without requiring more table partitions.

python

import json
import boto3
from typing import Any

dynamodb = boto3.resource("dynamodb")
idempotency_table = dynamodb.Table("StreamProcessorIdempotency")

def handler(event: dict, context: Any) -> None:
    for record in event["Records"]:
        sequence_number = record["dynamodb"]["SequenceNumber"]

        # Idempotency check — skip records already processed
        try:
            idempotency_table.put_item(
                Item={"sequence_number": sequence_number},
                ConditionExpression="attribute_not_exists(sequence_number)"
            )
        except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
            print(f"Skipping already-processed record: {sequence_number}")
            continue

        event_name = record["eventName"]  # INSERT | MODIFY | REMOVE

        if event_name == "INSERT":
            handle_insert(record["dynamodb"]["NewImage"])
        elif event_name == "MODIFY":
            handle_modify(
                old=record["dynamodb"].get("OldImage"),
                new=record["dynamodb"]["NewImage"]
            )
        elif event_name == "REMOVE":
            handle_remove(record["dynamodb"]["OldImage"])

def handle_insert(new_image: dict) -> None:
    print(f"New item created: {json.dumps(new_image)}")

def handle_modify(old: dict, new: dict) -> None:
    print(f"Item updated. Before: {old}, After: {new}")

def handle_remove(old_image: dict) -> None:
    print(f"Item deleted: {json.dumps(old_image)}")

Stream Event Types: INSERT, MODIFY, REMOVE

Every stream record contains an eventName field that tells your consumer what happened to the item:

INSERT — A new item was written to the table. Only NewImage is populated (there is no prior state). This is the trigger for "new order placed" or "user registered" workflows.
MODIFY — An existing item was updated. With NEW_AND_OLD_IMAGES, both OldImage and NewImage are available, enabling delta computation — for example, detecting when an order's status field transitions from PENDING to SHIPPED.
REMOVE — An item was deleted from the table (including TTL-triggered deletions). Only OldImage is populated. Note that TTL deletions may appear with a delay of up to 48 hours after the item's TTL expires, and they are distinguishable via the userIdentity.type == "Service" field.

Warning

TTL-expired items generate REMOVE stream records, but with unpredictable timing. Never rely on TTL removals for time-sensitive downstream actions — use a scheduled process or a separate expiration signal instead.

Use Cases: Cross-Region Replication and Event Sourcing

Cross-Region Replication

Before DynamoDB Global Tables existed, teams built cross-region replication manually using DynamoDB Streams and Lambda. The pattern is straightforward: a Lambda function in region A reads the stream and writes changed items to a table in region B. This remains relevant today for scenarios where Global Tables are too coarse-grained — for example, replicating only a filtered subset of items, or transforming data during replication.

Understanding this pattern also demystifies how Global Tables work internally. Global Tables are essentially a managed, AWS-operated version of the same Streams-and-Lambda replication loop, with conflict resolution based on last-writer-wins using wall-clock timestamps. When you enable a Global Table, DynamoDB provisions hidden stream consumers in each region that propagate changes bidirectionally. The aws:rep:updateregion attribute that appears on replicated items is the fingerprint of this internal mechanism — it prevents replication loops by marking which region originated a write.

Event Sourcing and CQRS

DynamoDB Streams is a natural fit for Command Query Responsibility Segregation (CQRS) architectures. The write model (commands) lands in DynamoDB; the stream drives projection functions that build read models in ElasticSearch, RDS, or another DynamoDB table optimized for query patterns. This decouples write throughput from query complexity and allows read models to be rebuilt from the stream at any time — within the 24-hour window, or from a longer-lived archive.

Kinesis Data Streams for DynamoDB: Longer Retention

The 24-hour stream retention limit is a hard constraint that catches teams off guard. If your use case requires replay windows longer than 24 hours — disaster recovery, audit archives, ML training pipelines — you need Kinesis Data Streams for DynamoDB.

This feature, enabled separately from native DynamoDB Streams, routes item-level changes to a Kinesis Data Stream that you own and configure. Kinesis supports retention of up to 365 days (with extended retention enabled). The trade-offs: you pay for the Kinesis stream separately, and you are responsible for managing shard count as the table grows. The record format is also slightly different from native Streams records — your consumer code needs to account for this.

bash

# Enable Kinesis Data Streams export for a DynamoDB table
aws dynamodb enable-kinesis-streaming-destination \
  --table-name Orders \
  --stream-arn arn:aws:kinesis:us-east-1:123456789012:stream/dynamodb-orders-stream \
  --enable-kinesis-streaming-configuration ApproximateCreationDateTimePrecision=MILLISECOND

Error Handling: Bisect-on-Error, DLQ, and Idempotency

Because Lambda processes stream records in order within a shard, a single poison-pill record can halt an entire shard's progress indefinitely. Lambda retries the batch until it succeeds or the records expire, and by default the function blocks on that shard until the issue is resolved. Three settings dramatically improve resilience:

BisectBatchOnFunctionError: true — When a batch fails, Lambda splits it in half and retries each half separately. This binary search continues until the failing batch contains a single record, isolating the poison pill with minimal impact on surrounding records.

MaximumRetryAttempts — Caps the number of retry attempts before Lambda gives up and moves on. Set this in combination with a DLQ so failed records are not silently dropped.

DestinationConfig.OnFailure (DLQ) — Routes exhausted records to an SQS queue or SNS topic for out-of-band investigation and reprocessing. Without this, failed records that exceed the retry limit or age out are permanently lost.

Tip

Set MaximumRecordAgeInSeconds to a value less than 86400 (24 hours) so that very old records — typically from a prolonged consumer outage — are routed to the DLQ rather than processed stale. Processing 23-hour-old order events as if they just occurred is usually worse than discarding them.

Idempotency is the final line of defense. Lambda's at-least-once delivery guarantee means the same stream record can be delivered multiple times — especially during shard re-sharding or after a bisected retry. The standard pattern is a conditional write to a separate DynamoDB table keyed by the record's SequenceNumber, using a condition expression that rejects duplicate writes. AWS Lambda Powertools provides a higher-level idempotency decorator if you prefer a library approach.

python

from aws_lambda_powertools.utilities.idempotency import (
    idempotent_function,
    DynamoDBPersistenceLayer,
    IdempotencyConfig,
)

persistence_layer = DynamoDBPersistenceLayer(table_name="IdempotencyTable")
config = IdempotencyConfig(event_key_jmespath="dynamodb.SequenceNumber")

@idempotent_function(data_keyword_argument="record", config=config, persistence_store=persistence_layer)
def process_record(record: dict) -> dict:
    # Business logic here — guaranteed to run at most once per SequenceNumber
    return {"status": "processed"}

Key Takeaways

DynamoDB Streams provides a 24-hour, ordered, shard-based log of item changes. Choose your StreamViewType carefully — NEW_AND_OLD_IMAGES is the most flexible but largest payload.
Lambda event source mappings handle all shard polling. Tune BatchSize, MaximumBatchingWindowInSeconds, and ParallelizationFactor to balance throughput against latency and concurrency costs.
INSERT, MODIFY, and REMOVE event types let consumers react precisely to what changed. MODIFY records with OLD_IMAGE enable state-transition logic without additional table reads.
Cross-region replication via Streams is the underlying mechanism of DynamoDB Global Tables. Building it manually gives you filtering and transformation capabilities that Global Tables do not provide.
Switch to Kinesis Data Streams for DynamoDB when you need retention beyond 24 hours — Kinesis supports up to 365 days.
Enable BisectBatchOnFunctionError, configure a DLQ, and implement idempotency checks (keyed on SequenceNumber) to build a production-grade, poison-pill-resistant consumer.

Manage Your DynamoDB Streams Infrastructure with JusDB

Configuring DynamoDB Streams, wiring up Lambda event source mappings, tuning parallelization factors, and standing up DLQ pipelines involves a lot of moving parts spread across multiple AWS services. JusDB gives you a unified control plane for your DynamoDB tables — including Streams configuration, consumer health monitoring, and error rate alerting — so you can build event-driven patterns without losing track of the operational details. Talk to the JusDB team to see how we can accelerate your next DynamoDB Streams deployment.

DynamoDB Streams and Lambda: Event-Driven Database Patterns

What Are DynamoDB Streams?

Stream Record View Types

Lambda Integration via Event Source Mapping

Stream Event Types: INSERT, MODIFY, REMOVE

Use Cases: Cross-Region Replication and Event Sourcing

Cross-Region Replication

Event Sourcing and CQRS

Kinesis Data Streams for DynamoDB: Longer Retention

Error Handling: Bisect-on-Error, DLQ, and Idempotency

Manage Your DynamoDB Streams Infrastructure with JusDB

Share this article

Need Expert Help?

Aerospike Services

Aerospike Performance Tuning

Aerospike Support

Cloud Database Cost Optimization

Related Articles

High Performance with MongoDB: A Top-Down Tuning Guide

MongoDB Explained (2026): Replica Sets, Sharding, Atlas & Production Patterns

ScyllaDB vs Apache Cassandra: Performance and Operational Differences