From 100MB to 50GB: Migrating DynamoDB Tables Across AWS Accounts

8 min read
Share:

You’d think moving data from one AWS account to another would be straightforward. Export it, import it, done. That’s what the documentation implies, and honestly, that’s what I assumed going in.

It wasn’t.

What started as a data migration quickly turned into a system reconstruction problem. In DynamoDB, data alone is only half the story.

This is the story of migrating 35+ DynamoDB tables, ranging from 100MB to over 50GB, across two AWS accounts, with a bucket in the destination account, streams to reconnect, Lambda triggers to recreate, and a PITR requirement nobody warned about until it failed mid-run.

If you’re planning something similar, this post will save you some pain.

The Setup

Here’s what we were working with:

  • 35+ DynamoDB tables in a source AWS account
  • A brand new destination AWS account being built from scratch
  • Tables with wildly varying sizes, some barely 100MB, others pushing 50GB+
  • Streams and Lambda triggers attached to several tables
  • GSIs that needed to be preserved exactly
  • Needed to repeat this across multiple environments, not a one-time run

Why Cross-Account?

The source account was our existing environment serving live traffic. The destination was a new account being built from scratch – full isolation, easy rollback, parallel validation before any traffic switched

Why Not Just Use the AWS Console?

The honest answer: you can use the UI for a single table. But the moment you have more than two or three tables, the console becomes a liability:

  • No repeatability – every step is manual, so every rerun is a fresh opportunity to miss something
  • No visibility – there’s no good way to track the status of 10+ concurrent export jobs from the UI
  • No schema injection – passing a full –table-creation-parameters file with GSIs isn’t something the UI supports cleanly

So the path was: understand the CLI flags first, validate with a single table, then wrap everything in scripts.

Starting with the CLI also forces clarity, you have to understand exactly what you’re passing and why.

How the Migration Was Structured

With 35+ tables, a single script wasn’t going to cut it. We split the work into two phases with a deliberate review step in between.

Phase 1 — Discovery (Source Account)

Before export, collect everything:

  • Table names, sizes, item counts
  • Full schema: keys, attribute definitions, billing mode
  • GSI configuration
  • Stream settings
  • Lambda event source mappings

Output: a set of CSV files and JSON schema definitions that a human can actually read and edit.

This is where decisions happen. Which tables move? Which Lambda names need to be remapped because they don’t exist in the destination yet?

Phase 2 — Execution (Cross-Account)

After review:

  • Enable PITR + trigger exports to S3 (source account)
  • Monitor export completion
  • Create tables and import data (destination account)
  • Monitor import completion
  • Recreate streams and Lambda triggers
  • Restore table settings (PITR, deletion protection)
  • Validate & compare item counts between source and destination

We needed a way to run this safely more than once, especially when things failed halfway, so we wrapped everything in scripts.

In practice, this came down to three steps:

  1. Run discovery (source account)
  2. Trigger exports (source account)
  3. Run import and setup (destination account)

The Architecture: Bucket in the Destination Account

The data lifecycle:

  • Source DynamoDB exports → S3
  • Destination DynamoDB imports ← S3

If the bucket is in the destination, step 2 is a local read. Fewer cross-account operations, simpler IAM, less surface area for things to go wrong.

You need source-account DynamoDB to write into a bucket it doesn’t own. That requires two specific things.

Flag 1: –s3-bucket-owner

shell

aws dynamodb export-table-to-point-in-time \

 –table-arn arn:aws:dynamodb:ap-southeast-2:SOURCE_ACCOUNT:table/MyTable \

–s3-bucket my-destination-export-bucket \

–s3-prefix exports/my-table \

–export-format DYNAMODB_JSON \

–s3-sse-algorithm AES256 \

–s3-bucket-owner $TARGET_BUCKET_ACCOUNT_ID

The –s3-bucket-owner flag is a safety check. DynamoDB will refuse the export if the bucket isn’t actually owned by the account ID you specify. It also tells S3 to assign ownership of the exported objects to the destination account.

–s3-sse-algorithm AES256 ensures everything lands encrypted.

Flag 2: The Bucket Policy

The bucket needs to trust two things independently:

  • The DynamoDB service (dynamodb.amazonaws.com) — so it can write the actual export files, scoped to your source account ID
  • Your IAM role – so the CLI call itself doesn’t fail before DynamoDB even gets involved

Miss either one and you get an AccessDenied that doesn’t tell you which layer broke.

 

The Thing That Stopped Everything: PITR

DynamoDB’s export feature is built on Point-in-Time Recovery. It takes a consistent snapshot of your table at a specific moment without locking the table or interrupting live traffic. That’s the feature that makes it genuinely useful.

But PITR has to be already enabled before you try to export.

We hit this mid-run. The error is clear enough once you see it, but finding it during a migration window is not a good time.

We updated the export script to check and enable PITR automatically, with a short buffer before triggering the export.

The Part People Miss: GSIs Don’t Come With the Export

DynamoDB exports capture item data only. The table’s structural configuration billing mode, stream settings, and critically, Global Secondary Indexes are not included.

This is the kind of issue you don’t notice immediately. The table looks fine… until a query suddenly doesn’t return what you expect.

How We Fixed It

At import time, the file gets passed directly:

python

import_cmd = [   

“aws”, “dynamodb”, “import-table”,

“–s3-bucket-source”, f”Bucket={bucket},KeyPrefix={prefix}”,

“–input-format”, “DYNAMODB_JSON”,

“–input-compression-type”, “GZIP”,

“–table-creation-parameters”, f”file://{temp_def}”,

]

The –table-creation-parameters file contains the full key schema, attribute definitions, GSI projections, and billing configuration. The destination table is created with everything intact in one step.

A few things worth noting about these flags:

  • DYNAMODB_JSON must match the export format. If your export used Ion format, the import needs to match. Mixing these produces unclear errors.
  • GZIP is necessary because DynamoDB compresses its export files by default. Without this flag, the import job will try to parse compressed data as raw JSON and fail.

Table Size Changes Everything

Not all 35+ tables behaved the same, and this became obvious quickly.

Table Size

Export Time

Import Time

~100 MB Minutes Minutes
~5–10 GB 20–40 min 30–60 min
50+ GB 2–4 hours 2–4 hours

Both operations are fully asynchronous. You fire the job and poll for completion. That’s why the monitoring scripts ended up being more valuable, fire the jobs, let the poller run, come back when it’s done.

IAM: More Layers Than It Looks Like

Two roles. Clean separation.

Source account role — used during discovery and export:

  • dynamodb:ListTables, DescribeTable, ExportTableToPointInTime, DescribeExport
  • lambda:ListEventSourceMappings
  • s3:PutObject, s3:AbortMultipartUpload on the destination bucket

Destination account role — used during import and setup:

  • dynamodb:CreateTable, ImportTable, DescribeImport, UpdateTable, Scan
  • lambda:CreateEventSourceMapping, GetFunction, GetEventSourceMapping
  • s3:GetObject, s3:ListBucket on the export bucket

Streams and Lambda Triggers: The Last Mile

After a successful import, the tables exist and the data is there. But streams are disabled, and there are no Lambda triggers. The tables are effectively silent.

Recreating these was the step that required the most care:

  1. Discovery captured all stream configurations and event source mappings from the source.
  2. A Lambda mapping file translated source function names to their destination equivalents because Lambda names rarely match across environments.
  3. Post-import, streams were re-enabled on each table and event source mappings were recreated using the destination function names.

The mapping file is a simple CSV that gets reviewed manually during the discovery phase. It’s a small thing, but it’s also the step where you’re most likely to introduce a silent bug, a trigger pointing at the wrong function version, or a batch size that doesn’t match the original.

If I Had to Do It Again

If I had to do this again, there are a few things I’d stick with:

  • Two-phase architecture – discovery before execution, no exceptions. The number of times something in the discovery output needed a correction before we touched any data was higher than expected.
  • Human review gate – not optional. Lambda names don’t match across environments, some tables shouldn’t move at all, and blind automation here is how you get silent bugs.
  • Async monitoring over console watching – for the large tables, the monitoring scripts were more useful than the migration scripts themselves. Build the poller, don’t stare at the console.
  • PITR on by default – not as a migration prerequisite, just as standard practice. Small ongoing cost, enormous value the day you actually need it.

Wrapping Up

DynamoDB cross-account migration via S3 works. The export is online and non-disruptive. The import handles full schema definitions, including GSIs. For the scale we were working at, it held up well.

But the complexity hides in the details, PITR silently required, bucket trust at the service level not just role level, GSIs needing explicit schema files, streams rebuilt from scratch.

None of those are hard problems once you know about them. The issue is, you usually find out mid-migration.

This kind of migration isn’t hard because of the tools. It’s hard because of the details you don’t see until you’re in the middle of it.

Leave a Reply

Your email address will not be published. Required fields are marked *

Services