{"id":79347,"date":"2026-05-24T15:14:18","date_gmt":"2026-05-24T09:44:18","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=79347"},"modified":"2026-06-08T18:40:32","modified_gmt":"2026-06-08T13:10:32","slug":"from-100mb-to-50gb-migrating-dynamodb-tables-across-aws-accounts","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/from-100mb-to-50gb-migrating-dynamodb-tables-across-aws-accounts\/","title":{"rendered":"From 100MB to 50GB: Migrating DynamoDB Tables Across AWS Accounts"},"content":{"rendered":"<p>You&#8217;d think moving data from one AWS account to another would be straightforward. Export it, import it, done. That&#8217;s what the <a href=\"https:\/\/docs.aws.amazon.com\/amazondynamodb\/latest\/developerguide\/bp-migrating-table-between-accounts-s3.html\">documentation<\/a> implies, and honestly, that&#8217;s what I assumed going in.<\/p>\n<p>It wasn&#8217;t.<\/p>\n<p>What started as a data migration quickly turned into a system reconstruction problem. In DynamoDB, data alone is only half the story.<\/p>\n<p>This is the story of migrating 35+ DynamoDB tables, ranging from 100MB to over 50GB, across two AWS accounts, with a bucket in the destination account, streams to reconnect, Lambda triggers to recreate, and a PITR requirement nobody warned about until it failed mid-run.<\/p>\n<p>If you&#8217;re planning something similar, this post will save you some pain.<\/p>\n<h2><strong>The Setup<\/strong><\/h2>\n<p>Here&#8217;s what we were working with:<\/p>\n<ul>\n<li>35+ DynamoDB tables in a source AWS account<\/li>\n<li>A brand new destination AWS account being built from scratch<\/li>\n<li>Tables with wildly varying sizes, some barely 100MB, others pushing 50GB+<\/li>\n<li>Streams and Lambda triggers attached to several tables<\/li>\n<li>GSIs that needed to be preserved exactly<\/li>\n<li>Needed to repeat this across multiple environments, not a one-time run<\/li>\n<\/ul>\n<h2><strong>Why Cross-Account?<\/strong><\/h2>\n<p>The source account was our existing environment serving live traffic. The destination was a new account being built from scratch &#8211; full isolation, easy rollback, parallel validation before any traffic switched<\/p>\n<h2><strong>Why Not Just Use the AWS Console?<\/strong><\/h2>\n<p>The honest answer: you can use the UI for a single table. But the moment you have more than two or three tables, the console becomes a liability:<\/p>\n<ul>\n<li><strong>No repeatability<\/strong> &#8211; every step is manual, so every rerun is a fresh opportunity to miss something<\/li>\n<li><strong>No visibility<\/strong> &#8211; there&#8217;s no good way to track the status of 10+ concurrent export jobs from the UI<\/li>\n<li><strong>No schema injection<\/strong> &#8211; passing a full <span style=\"color: #99cc00;\">&#8211;table-creation-parameters<\/span> file with GSIs isn&#8217;t something the UI supports cleanly<\/li>\n<\/ul>\n<p>So the path was: understand the CLI flags first, validate with a single table, then wrap everything in scripts.<\/p>\n<p>Starting with the CLI also forces clarity, you have to understand exactly what you&#8217;re passing and why.<\/p>\n<h2><strong>How the Migration Was Structured<\/strong><\/h2>\n<p>With 35+ tables, a single script wasn&#8217;t going to cut it. We split the work into two phases with a deliberate review step in between.<\/p>\n<h3>Phase 1 \u2014 Discovery (Source Account)<\/h3>\n<p>Before export, collect everything:<\/p>\n<ul>\n<li>Table names, sizes, item counts<\/li>\n<li>Full schema: keys, attribute definitions, billing mode<\/li>\n<li>GSI configuration<\/li>\n<li>Stream settings<\/li>\n<li>Lambda event source mappings<\/li>\n<\/ul>\n<p><strong>Output:<\/strong>\u00a0a set of CSV files and JSON schema definitions that a human can actually read and edit.<\/p>\n<p>This is where decisions happen. Which tables move? Which Lambda names need to be remapped because they don&#8217;t exist in the destination yet?<\/p>\n<h3>Phase 2 \u2014 Execution (Cross-Account)<\/h3>\n<p>After review:<\/p>\n<ul>\n<li>Enable PITR + trigger exports to S3 (source account)<\/li>\n<li>Monitor export completion<\/li>\n<li>Create tables and import data (destination account)<\/li>\n<li>Monitor import completion<\/li>\n<li>Recreate streams and Lambda triggers<\/li>\n<li>Restore table settings (PITR, deletion protection)<\/li>\n<li>Validate &amp; compare item counts between source and destination<\/li>\n<\/ul>\n<p>We needed a way to run this safely more than once, especially when things failed halfway, so we wrapped everything in scripts.<\/p>\n<p>In practice, this came down to three steps:<\/p>\n<ol>\n<li>Run discovery (source account)<\/li>\n<li>Trigger exports (source account)<\/li>\n<li>Run import and setup (destination account)<\/li>\n<\/ol>\n<h2>The Architecture: Bucket in the Destination Account<\/h2>\n<p>The data lifecycle:<\/p>\n<ul>\n<li>Source DynamoDB exports \u2192 S3<\/li>\n<li>Destination DynamoDB imports \u2190 S3<\/li>\n<\/ul>\n<p>If the bucket is in the destination, step 2 is a local read. Fewer cross-account operations, simpler IAM, less surface area for things to go wrong.<\/p>\n<p>You need source-account DynamoDB to write into a bucket it doesn&#8217;t own. That requires two specific things.<\/p>\n<h3>Flag 1: &#8211;s3-bucket-owner<\/h3>\n<p><strong>shell<\/strong><\/p>\n<blockquote><p><span style=\"color: #008000;\">aws dynamodb export-table-to-point-in-time \\<\/span><\/p>\n<p><span style=\"color: #008000;\">\u00a0&#8211;table-arn arn:aws:dynamodb:ap-southeast-2:SOURCE_ACCOUNT:table\/MyTable \\<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8211;s3-bucket my-destination-export-bucket \\<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8211;s3-prefix exports\/my-table \\<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8211;export-format DYNAMODB_JSON \\<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8211;s3-sse-algorithm AES256 \\<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8211;s3-bucket-owner $TARGET_BUCKET_ACCOUNT_ID<\/span><\/p><\/blockquote>\n<p>The <span style=\"color: #008000;\">&#8211;s3-bucket-owner<\/span> flag is a safety check. DynamoDB will refuse the export if the bucket isn&#8217;t actually owned by the account ID you specify. It also tells S3 to assign ownership of the exported objects to the destination account.<\/p>\n<p><span style=\"color: #008000;\">&#8211;s3-sse-algorithm AES256<\/span> ensures everything lands encrypted.<\/p>\n<h3>Flag 2: The Bucket Policy<\/h3>\n<p>The bucket needs to trust two things independently:<\/p>\n<ul>\n<li>The DynamoDB service (<span style=\"color: #008000;\">dynamodb.amazonaws.com<\/span>) \u2014 so it can write the actual export files, scoped to your source account ID<\/li>\n<li>Your IAM role &#8211; so the CLI call itself doesn&#8217;t fail before DynamoDB even gets involved<\/li>\n<\/ul>\n<p>Miss either one and you get an <span style=\"color: #008000;\">AccessDenied<\/span> that doesn&#8217;t tell you which layer broke.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>The Thing That Stopped Everything: PITR<\/strong><\/h2>\n<p>DynamoDB&#8217;s export feature is built on Point-in-Time Recovery. It takes a consistent snapshot of your table at a specific moment without locking the table or interrupting live traffic. That&#8217;s the feature that makes it genuinely useful.<\/p>\n<p>But PITR has to be <strong>already enabled<\/strong> before you try to export.<\/p>\n<p>We hit this mid-run. The error is clear enough once you see it, but finding it during a migration window is not a good time.<\/p>\n<p>We updated the export script to check and enable PITR automatically, with a short buffer before triggering the export.<\/p>\n<h2><strong>The Part People Miss: GSIs Don&#8217;t Come With the Export<\/strong><\/h2>\n<p>DynamoDB exports capture item data only. The table&#8217;s structural configuration billing mode, stream settings, and critically, <strong>Global Secondary Indexes<\/strong> are not included.<\/p>\n<p>This is the kind of issue you don\u2019t notice immediately. The table looks fine\u2026 until a query suddenly doesn\u2019t return what you expect.<\/p>\n<h3>How We Fixed It<\/h3>\n<p>At import time, the file gets passed directly:<\/p>\n<p><strong>python<\/strong><\/p>\n<blockquote><p><span style=\"color: #008000;\">import_cmd = [\u00a0 \u00a0<\/span><\/p>\n<p><span style=\"color: #008000;\"> &#8220;aws&#8221;, &#8220;dynamodb&#8221;, &#8220;import-table&#8221;,<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8220;&#8211;s3-bucket-source&#8221;, f&#8221;Bucket={bucket},KeyPrefix={prefix}&#8221;,<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8220;&#8211;input-format&#8221;, &#8220;DYNAMODB_JSON&#8221;,<\/span><\/p>\n<p><span style=\"color: #008000;\">&#8220;&#8211;input-compression-type&#8221;, &#8220;GZIP&#8221;,<\/span><\/p>\n<p><span style=\"color: #008000;\"> &#8220;&#8211;table-creation-parameters&#8221;, f&#8221;file:\/\/{temp_def}&#8221;,<\/span><\/p>\n<p><span style=\"color: #008000;\">]<\/span><\/p><\/blockquote>\n<p>The<span style=\"color: #008000;\"> &#8211;table-creation-parameters<\/span> file contains the full key schema, attribute definitions, GSI projections, and billing configuration. The destination table is created with everything intact in one step.<\/p>\n<p>A few things worth noting about these flags:<\/p>\n<ul>\n<li><span style=\"color: #008000;\">DYNAMODB_JSON<\/span> must match the export format. If your export used Ion format, the import needs to match. Mixing these produces unclear errors.<\/li>\n<li><span style=\"color: #008000;\">GZIP<\/span> is necessary because DynamoDB compresses its export files by default. Without this flag, the import job will try to parse compressed data as raw JSON and fail.<\/li>\n<\/ul>\n<h2>Table Size Changes Everything<\/h2>\n<p>Not all 35+ tables behaved the same, and this became obvious quickly.<\/p>\n<table style=\"border-collapse: collapse; width: 100%; height: 96px;\">\n<tbody>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\">\n<h4><strong> Table Size<\/strong><\/h4>\n<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">\n<h4><strong> Export Time<\/strong><\/h4>\n<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">\n<h4><strong> Import Time<\/strong><\/h4>\n<\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\"> ~100 MB<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\">Minutes<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\">Minutes<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\"> ~5\u201310 GB<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\"> 20\u201340 min<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\"> 30\u201360 min<\/span><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\"> 50+ GB<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\">2\u20134 hours<\/span><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><span style=\"color: #000000;\">2\u20134 hours<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Both operations are fully asynchronous. You fire the job and poll for completion. That&#8217;s why the monitoring scripts ended up being more valuable, fire the jobs, let the poller run, come back when it&#8217;s done.<\/p>\n<h2><strong>IAM: More Layers Than It Looks Like<\/strong><\/h2>\n<p>Two roles. Clean separation.<\/p>\n<p><strong>Source account role<\/strong> \u2014 used during discovery and export:<\/p>\n<ul>\n<li><span style=\"color: #008000;\">dynamodb:ListTables, DescribeTable, ExportTableToPointInTime, DescribeExport<\/span><\/li>\n<li><span style=\"color: #008000;\">lambda:ListEventSourceMappings<\/span><\/li>\n<li><span style=\"color: #008000;\">s3:PutObject, s3:AbortMultipartUpload<\/span> on the destination bucket<\/li>\n<\/ul>\n<p><strong>Destination account role<\/strong> \u2014 used during import and setup:<\/p>\n<ul>\n<li><span style=\"color: #008000;\">dynamodb:CreateTable, ImportTable, DescribeImport, UpdateTable, Scan<\/span><\/li>\n<li><span style=\"color: #008000;\">lambda:CreateEventSourceMapping, GetFunction, GetEventSourceMapping<\/span><\/li>\n<li><span style=\"color: #008000;\">s3:GetObject, s3:ListBucket on<\/span> the export bucket<\/li>\n<\/ul>\n<h2>Streams and Lambda Triggers: The Last Mile<\/h2>\n<p>After a successful import, the tables exist and the data is there. But streams are disabled, and there are no Lambda triggers. The tables are effectively silent.<\/p>\n<p>Recreating these was the step that required the most care:<\/p>\n<ol>\n<li>Discovery captured all stream configurations and event source mappings from the source.<\/li>\n<li>A Lambda mapping file translated source function names to their destination equivalents because Lambda names rarely match across environments.<\/li>\n<li>Post-import, streams were re-enabled on each table and event source mappings were recreated using the destination function names.<\/li>\n<\/ol>\n<p>The mapping file is a simple CSV that gets reviewed manually during the discovery phase. It&#8217;s a small thing, but it&#8217;s also the step where you&#8217;re most likely to introduce a silent bug, a trigger pointing at the wrong function version, or a batch size that doesn&#8217;t match the original.<\/p>\n<h2>If I Had to Do It Again<\/h2>\n<p>If I had to do this again, there are a few things I\u2019d stick with:<\/p>\n<ul>\n<li><strong>Two-phase architecture<\/strong> &#8211; discovery before execution, no exceptions. The number of times something in the discovery output needed a correction before we touched any data was higher than expected.<\/li>\n<li><strong>Human review gate<\/strong> &#8211; not optional. Lambda names don&#8217;t match across environments, some tables shouldn&#8217;t move at all, and blind automation here is how you get silent bugs.<\/li>\n<li><strong>Async monitoring over console watching<\/strong> &#8211; for the large tables, the monitoring scripts were more useful than the migration scripts themselves. Build the poller, don&#8217;t stare at the console.<\/li>\n<li><strong>PITR on by default<\/strong> &#8211; not as a migration prerequisite, just as standard practice. Small ongoing cost, enormous value the day you actually need it.<\/li>\n<\/ul>\n<h2>Wrapping Up<\/h2>\n<p>DynamoDB cross-account migration via S3 works. The export is online and non-disruptive. The import handles full schema definitions, including GSIs. For the scale we were working at, it held up well.<\/p>\n<p>But the complexity hides in the details, PITR silently required, bucket trust at the service level not just role level, GSIs needing explicit schema files, streams rebuilt from scratch.<\/p>\n<p>None of those are hard problems once you know about them. The issue is, you usually find out mid-migration.<\/p>\n<p>This kind of migration isn\u2019t hard because of the tools. It\u2019s hard because of the details you don\u2019t see until you\u2019re in the middle of it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You&#8217;d think moving data from one AWS account to another would be straightforward. Export it, import it, done. That&#8217;s what the documentation implies, and honestly, that&#8217;s what I assumed going in. It wasn&#8217;t. What started as a data migration quickly turned into a system reconstruction problem. In DynamoDB, data alone is only half the story. [&hellip;]<\/p>\n","protected":false},"author":1615,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":0},"categories":[2348],"tags":[248,8565,7788],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79347"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1615"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=79347"}],"version-history":[{"count":3,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79347\/revisions"}],"predecessor-version":[{"id":80023,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79347\/revisions\/80023"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=79347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=79347"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=79347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}