{"id":79382,"date":"2026-04-27T11:18:33","date_gmt":"2026-04-27T05:48:33","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=79382"},"modified":"2026-05-12T10:16:14","modified_gmt":"2026-05-12T04:46:14","slug":"aws-datasync-gcp-to-aws-data-transfer-for-google-analytics","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/aws-datasync-gcp-to-aws-data-transfer-for-google-analytics\/","title":{"rendered":"AWS DataSync: GCP to AWS Data Transfer for Google Analytics"},"content":{"rendered":"<p><strong>1. Introduction<\/strong><\/p>\n<p>If your team uses Google Analytics (GA) to track user behavior but needs to run custom SQL queries \u2014 like Daily Active Users (DAU) or session funnels \u2014 there&#8217;s a fundamental problem: GA data lives in Google Cloud Storage (GCS), and Amazon Athena (your SQL engine) only reads from Amazon S3.<\/p>\n<p>Manually downloading and re-uploading files every day isn&#8217;t sustainable. The answer is AWS DataSync \u2014 a fully managed service that copies data from GCS to S3 automatically on a daily schedule, with checksum verification and zero infrastructure to manage.<\/p>\n<p><span style=\"color: #808080;\">Pipeline: GA exports to GCS\u00a0 \u2192\u00a0 DataSync copies to S3 daily\u00a0 \u2192\u00a0 Athena runs SQL queries\u00a0 \u2192\u00a0 Analytics team gets answers.<\/span><\/p>\n<p><strong>2. How It Works \u2014 Architecture Overview<\/strong><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-79383 size-large alignnone\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM-1024x683.png\" alt=\"rr\" width=\"625\" height=\"417\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM-1024x683.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM-300x200.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM-768x512.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM-624x416.png 624w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/ChatGPT-Image-Apr-1-2026-12_49_54-PM.png 1536w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><\/p>\n<ul>\n<li>Google Analytics 4 exports raw event data daily to a GCS bucket (e.g. gs:\/\/afs-prd-ga-reports\/events\/).<\/li>\n<li>AWS DataSync connects to GCS using an Object Storage location \u2014 no agent or VM required. It authenticates using GCS Interoperability keys.<\/li>\n<li>A scheduled DataSync Task runs at 04:10 UTC every day, transferring only new or changed files (incremental) to the S3 destination.<\/li>\n<li>Amazon S3 stores the files in a structured path. AWS Glue Crawler auto-detects the schema and Amazon Athena runs SQL queries on top.<\/li>\n<\/ul>\n<p><strong>Services Involved<\/strong><\/p>\n<p>&nbsp;<\/p>\n<table style=\"border-collapse: collapse; width: 94.44%; height: 404px;\">\n<tbody>\n<tr style=\"height: 10px;\">\n<td style=\"width: 33.3333%; height: 10px;\"><strong>Service<\/strong><\/td>\n<td style=\"width: 33.3333%; height: 10px;\"><strong>\u00a0<\/strong><br \/>\n<strong>Role in Pipeline<\/strong><\/td>\n<td style=\"width: 33.3333%; height: 10px;\"><strong>\u00a0<\/strong><br \/>\n<strong>Key Capability<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 72px;\">\n<td style=\"width: 33.3333%; height: 72px;\">Google Analytics<\/td>\n<td style=\"width: 33.3333%; height: 72px;\">Source of user behavior data<\/td>\n<td style=\"width: 33.3333%; height: 72px;\">Tracks DAU, sessions, engagement<\/td>\n<\/tr>\n<tr style=\"height: 48px;\">\n<td style=\"width: 33.3333%; height: 48px;\">AWS DataSync<\/td>\n<td style=\"width: 33.3333%; height: 48px;\">Orchestrates scheduled transfers<\/td>\n<td style=\"width: 33.3333%; height: 48px;\">Reliable, encrypted data movement<\/td>\n<\/tr>\n<tr style=\"height: 48px;\">\n<td style=\"width: 33.3333%; height: 48px;\">Amazon S3<\/td>\n<td style=\"width: 33.3333%; height: 48px;\">Amazon S3<\/td>\n<td style=\"width: 33.3333%; height: 48px;\">Scalable, durable object storage<\/td>\n<\/tr>\n<tr style=\"height: 67px;\">\n<td style=\"width: 33.3333%; height: 67px;\">Amazon Athena<\/td>\n<td style=\"width: 33.3333%; height: 67px;\">Analytics query engine<\/td>\n<td style=\"width: 33.3333%; height: 67px;\">Serverless SQL on S3 data<\/td>\n<\/tr>\n<tr style=\"height: 10px;\">\n<td style=\"width: 33.3333%; height: 10px;\">Google Cloud Storage<\/td>\n<td style=\"width: 33.3333%; height: 10px;\">Staging area for GA exports<\/td>\n<td style=\"width: 33.3333%; height: 10px;\">Stores CSV\/Parquet exports<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"color: #993366;\"><strong>Google Analytics<\/strong><\/span><br \/>\nGoogle Analytics tracks website\/app traffic and user behavior. GA4 allows exporting raw event data (sessions, events, page views, demographics) to BigQuery or GCS for advanced analytics.<\/p>\n<p><span style=\"color: #993366;\"><strong>Google Cloud Storage (GCS)<\/strong><\/span><br \/>\nGCS is Google\u2019s object storage where GA exports data as structured files. It supports secure access via IAM and service accounts, enabling external tools (like DataSync) to read data.<\/p>\n<p><span style=\"color: #993366;\"><strong>AWS DataSync<\/strong><\/span><br \/>\nA fully managed service to transfer data between cloud platforms (including GCP \u2192 AWS).<br \/>\nKey features:<\/p>\n<ul>\n<li>Secure transfer (TLS encryption)<\/li>\n<li>Data validation (checksums)<\/li>\n<li>Scheduling support<\/li>\n<li>Bandwidth control<\/li>\n<li>CloudWatch logging<\/li>\n<li>No infra management<\/li>\n<\/ul>\n<p><span style=\"color: #993366;\"><strong>Amazon S3<\/strong><\/span><br \/>\nAWS scalable object storage used as the central data lake. Integrates easily with analytics services like Athena and Glue.<\/p>\n<p><span style=\"color: #993366;\"><strong>Amazon Athena<\/strong><\/span><br \/>\nServerless query service to run SQL directly on S3 data. Uses AWS Glue Data Catalog for schema and charges per query (no infra required).<\/p>\n<p><strong>3. Implementation (4 Steps)<\/strong><\/p>\n<p>No agents, no VMs, no custom code. Everything is configured inside the AWS Console.<\/p>\n<p>1. Create the Source Location \u2014 GCS<\/p>\n<p>AWS DataSync \u2192 Locations \u2192 Create location \u2192 select Object storage. Set Server: storage.googleapis.com, Bucket: afs-prd-ga-reports, Folder: \/events\/, Protocol: HTTPS, Port: 443. Under Authentication, enter the GCS Interoperability Access Key and Secret Key (generated from GCP Console \u2192 Cloud Storage \u2192 Settings \u2192 Interoperability). Click Create location.<\/p>\n<p>2. Create the Destination Location \u2014 Amazon S3<\/p>\n<p>AWS DataSync \u2192 Locations \u2192 Create location \u2192 select Amazon S3. Set S3 bucket: afs1-prd-ga-events-s3-442426851011-ap-south-1-an, Storage class: Standard, IAM role: DataSyncS3AccessRole (needs s3:PutObject, s3:GetBucketLocation, s3:ListBucket). Click Create location.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-79465 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25.png\" alt=\"tde\" width=\"1116\" height=\"731\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25.png 1116w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25-300x197.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25-1024x671.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25-768x503.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-25-624x409.png 624w\" sizes=\"(max-width: 1116px) 100vw, 1116px\" \/><\/p>\n<p>3. Create the DataSync Task<\/p>\n<p>AWS DataSync \u2192 Tasks \u2192 Create task. Select the GCS location as source and S3 location as destination. Task name: GCS-AWSS3-SYNC. Set Task mode: Enhanced, Transfer mode: Transfer only data that has changed, Verification: Enabled, Logging: CloudWatch log group \/aws\/datasync. Click Next.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-79703 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33.png\" alt=\"4\" width=\"1325\" height=\"559\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33.png 1325w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33-300x127.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33-1024x432.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33-768x324.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-33-624x263.png 624w\" sizes=\"(max-width: 1325px) 100vw, 1325px\" \/><\/p>\n<p>4. Set the Daily Schedule &amp; Run<\/p>\n<p>In the Schedule section, enable scheduling and enter the cron expression below. Click Create task. The first execution will appear in the History tab \u2014 Status should show Success within a few minutes of the scheduled time.<\/p>\n<div id=\"attachment_79624\" style=\"width: 906px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-79624\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-79624 \" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1.png\" alt=\"t\" width=\"896\" height=\"113\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1.png 1609w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1-300x38.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1-1024x129.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1-768x97.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1-1536x194.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-31-1-624x79.png 624w\" sizes=\"(max-width: 896px) 100vw, 896px\" \/><p id=\"caption-attachment-79624\" class=\"wp-caption-text\">t<\/p><\/div>\n<p><strong>Verify: Check DataSync \u2192 Tasks \u2192 GCS-AWSS3-SYNC \u2192 History (Status: Success). Confirm files appear at s3:\/\/afs1-prd-ga-events-s3-&#8230;\/events\/ in the S3 console.<\/strong><\/p>\n<p><strong>4. Querying the Data in Athena<\/strong><\/p>\n<p>Once data is in S3, use AWS Glue Crawler to auto-detect the schema (especially important since GA exports nested JSON), then query in Athena.<\/p>\n<p><span style=\"color: #993366;\"><strong>Step A \u2014 Run AWS Glue Crawler<\/strong><\/span><\/p>\n<ul>\n<li>AWS Glue \u2192 Crawlers \u2192 Create crawler \u2192 set data source to the S3 path above.<\/li>\n<li>Assign an IAM role with S3 read + Glue write permissions. Set output database: ga_database.<\/li>\n<li>Run the crawler. It creates a table (e.g. events) in the Glue Data Catalog automatically.<\/li>\n<\/ul>\n<p>Why not manually write CREATE TABLE? GA exports often have nested JSON like { &#8220;user&#8221;: { &#8220;id&#8221;: &#8220;123&#8221; } }. A flat schema silently drops nested fields. The Glue Crawler handles this correctly.<\/p>\n<p><span style=\"color: #993366;\"><strong>Step B \u2014 Run SQL in Athena<\/strong><\/span><\/p>\n<p>Daily Active Users (DAU):<\/p>\n<p><em>SELECT event_date,<\/em><br \/>\n<em>COUNT(DISTINCT user_id) AS daily_active_users<\/em><br \/>\n<em>FROM ga_database.events<\/em><br \/>\n<em>WHERE event_date = DATE_FORMAT(CURRENT_DATE &#8211; INTERVAL &#8216;1&#8217; DAY, &#8216;%Y%m%d&#8217;)<\/em><br \/>\n<em>GROUP BY event_date;<\/em><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-79463 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-29.png\" alt=\"t\" width=\"633\" height=\"101\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-29.png 633w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-29-300x48.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-29-624x100.png 624w\" sizes=\"(max-width: 633px) 100vw, 633px\" \/><\/p>\n<p>Top pages by sessions:<\/p>\n<p><em>SELECT page_location,<\/em><br \/>\n<em>COUNT(DISTINCT session_id) AS sessions<\/em><br \/>\n<em>FROM ga_database.events<\/em><br \/>\n<em>WHERE event_name = &#8216;page_view&#8217;<\/em><br \/>\n<em>GROUP BY page_location ORDER BY sessions DESC LIMIT 20;<\/em><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-79460 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-28.png\" alt=\"r\" width=\"627\" height=\"99\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-28.png 627w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-28-300x47.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/04\/image-28-624x99.png 624w\" sizes=\"(max-width: 627px) 100vw, 627px\" \/><\/p>\n<p><strong>5. Monitoring &amp; Alerts<\/strong><\/p>\n<p>Set up CloudWatch Alarms on these three metrics and route to an SNS topic so failures are caught before the analytics team notices missing data.<\/p>\n<table style=\"border-collapse: collapse; width: 100%; height: 96px;\">\n<tbody>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\"><strong> Metric<\/strong><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><strong> What It Detects<\/strong><\/td>\n<td style=\"width: 33.3333%; height: 24px;\"><strong> Alert When<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\">BytesWritten<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Data successfully landed in S3<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Value = 0 after scheduled run<\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\">FilesTransferred<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Files moved from GCS to S3<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Zero files transferred<\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 33.3333%; height: 24px;\">TaskExecutionsFailed<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Failed DataSync executions<\/td>\n<td style=\"width: 33.3333%; height: 24px;\">Count &gt; 0 (page on-call team)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>Route all alarms \u2192 SNS topic (datasync-alerts) \u2192 email. Set the evaluation period to 1 day to match the daily schedule.<\/em><\/p>\n<p><strong>6. Best Practices<\/strong><\/p>\n<ul>\n<li>Use Parquet + partitioning \u2192 faster queries &amp; lower cost<\/li>\n<li>Secure &amp; manage data \u2192 Secrets Manager + Lifecycle (Glacier)<\/li>\n<li>Maintain reliability \u2192 Glue Crawler (daily) + S3 Versioning<\/li>\n<\/ul>\n<p><strong>Conclusion<\/strong><\/p>\n<p>AWS DataSync removes the complexity of cross-cloud data movement. With four straightforward steps \u2014 two locations, one task, one schedule \u2014 your Google Analytics data flows from GCS to S3 every single day without any manual work.<\/p>\n<p>Pair it with AWS Glue and Athena and you have a fully serverless analytics pipeline \u2014 from raw GA events to SQL-ready insights \u2014 that scales automatically and costs nothing to maintain.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Introduction If your team uses Google Analytics (GA) to track user behavior but needs to run custom SQL queries \u2014 like Daily Active Users (DAU) or session funnels \u2014 there&#8217;s a fundamental problem: GA data lives in Google Cloud Storage (GCS), and Amazon Athena (your SQL engine) only reads from Amazon S3. Manually downloading [&hellip;]<\/p>\n","protected":false},"author":2265,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":1},"categories":[5877],"tags":[8233,8568,4948,1703,670],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79382"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/2265"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=79382"}],"version-history":[{"count":44,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79382\/revisions"}],"predecessor-version":[{"id":79768,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/79382\/revisions\/79768"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=79382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=79382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=79382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}