Snapshot Migration to S3 and Extract Data from Snapshot using Athena

11 / Nov / 2023 by Patel Akshaykumar 0 comments

Introduction

Sometimes We have to keep our database backup for longer retention for Security and DR Compliances in RDS. But as we know, if we have a longer backup period in RDS, it will also create more burden on the pocket, as RDS charges for storage in Normal S3 bucket format. So we can export our backup snapshots from RDS to an S3 bucket with Apache Parquet Format, which can save some charges, and we can use those snapshot data for future reports or compliance checks as well using Athena and Glue.

  1. We can export all types of backup from RDS to S3
  2. Steps to export to s3:
    • Create an S3 bucket with Proper IAM Permission
    • Create a KMS key for server-side encryption (SSE)
    • We can use CLI or GUI to perform this action
  3. No Impact on Performance as export runs in the background
  4.  Data will be exported into S3 in Apache Parquet format. It is 2x faster and stores 1/6 of the storage compared to the normal test format
  5. We can use that exported data to create reports using Athena and EMR

 

Architecture of Migration Process

Used Tools and AWS Components –

  • AWS RDS
  • AWS S3
  • AWS KMS
  • AWS IAM
  • AWS Glue
  • AWS Crawler
  • AWS Athena

Steps Performed for this Activity

  1. Create an S3 bucket in AWS 
  2. Create a KMS Key or use the Existing Key for Encryption
  3. Take RDS Database Snapshot
  4. Export Snapshot to S3 Bucket with Parquet Format
  5. Validate the Snapshot Migration (Check Size and Cost)
  6. Create a Glue Crawler and Run it to fetch data
  7. Use Athena to Execute Query and generate Report  – Data Will be stored in CSV format in the S3 bucket location
  8. You can also use those output files to import data into another database.

 

Execution Steps – 

  • This process has been performed on the Beta Database of PostgreSQL Snapshot.

Step 1: Export of RDS DB Snapshot to S3 Bucket

  • Take a Snapshot of the Beta Database from RDS
  • Export Database Snapshot to S3 bucket from Snapshot Menu
  • Created KMS key as “glue-kms” in Key Management Service with default settings and copied the arn of the key created to export db snapshot configuration, and Crawler will use same to extract data by AWS Glue.
  • Select S3 Bucket to upload and provide a proper IAM Role with a Newly created KMS Key to encrypt the database snapshot.
  • Snapshot has been exported in the S3 bucket and checked directly from the S3 Console.
  • After completion of the export task to the S3 bucket, I am able to see the databases and tables in the bucket named test-beta.

Step 2: Create and Run Crawler in AWS Glue to export S3 data in Glue Data Catalog

  • In AWS Glue Console, Go to the crawler option and click on the add crawler button. Then, give the crawler name as beta-demo and click next
  • Click on Next and select a data source 
  • Select the S3 folder and provide the path that you want to extract and get data from a particular table or full database.
  • Here, we have selected a specific table customer_master to check.
  • Select the IAM role which AWS Glue and Crawler will use to access Snapshot from S3 Bucket

Create Database

1. Review the Created Crawler

2. Select Data Source for Crawler

3. Create a schedule for Crawler

4. Run The Crawler and see the process

5. Check the Database in AWS Glue, and you will find the table that the crawler extracted from the S3 Bucket

Step 3: In Athena, run of queries and store of queries output in S3 bucket

1. Check Query Editor

2. In Query Editor, You will find the Data Source as AWS Data Catalog, and demodb will be there as a database because it is the only database available right now.

3. Go to Settings and specify the location to store output, which you will get after the execution of queries in Athena , we have provided the Same S3 bucket with a New Folder in it.

4. Execute the query and see the result.

5. Check the Output in the Query editor and the S3 bucket, where it will be saved in CSV format.

You can now download the CSV file open it, and check the result to verify.

Notes:- 

  • A crawler will be used if the requirement comes to get a report or data from any historical snapshot
  • Costing of the crawler will be based on the minutes of usage
  • Costing of Athena will be based on the data explored from snapshot

Costing Summary

S3 Storage – $0.025 per GB per month
KMS Key – $1 per key per month
KMS encrypt pricing – $0.03 for encrypting 10,000 objects
RDS Export to S3 Cost – $0.010 per GB of snapshot size
Glue Crawler Pricing – $0.44 per DPU-Hour, billed per second, with a 10-minute minimum per crawler run
Athena Pricing – $5 per TB of data scanned

 

Cost Estimation for the Complete Operation

Total Size of Snapshots (TB) 65            
               
Activity Quantity Rate (per quantity’s dimension) Cost Frequency   Trial Performed on a snapshot
KMS Key 1 $1.00 $1.00 Monthly   Size of Snapshot (TB) 2.5
S3 Storage (GB) 6,500 $0.025 $162.50 Monthly   Size after Compression (TB) 0.239
S3 API (number of PUT objects) 3209205 $0.0000153 $48.95 One-time   Objects created from snapshot migration 118000
KMS Encrypt 3209205 $0.000003 $9.63 One-time   S3 API Cost 1.8
RDS Export (per GB) 65000 $0.01 $650.00 One-time      
Crawler Running Minutes 240 $0.0073 $1.76 Monthly   Assumptions Derived
Athena Data Scanned (TB) 3 $5.00 $15.00 Monthly   Compression Ratio 10:1
    Total $888.84     Number of Objects per GB after Compression 494
            API Cost per GB after compression $0.0075
            API Cost per object $0.000015
NOTE: Additional KMS API charges will be applicable when Crawler and Athena will decrypt the objects      
            Additional Assumptions
            Crawler running hours per month 4
            Data Scanned by Athena per month (TB) 3
            Number of KMS Keys 1

 

Comparison Of Costing

Operation Old Costing Changes New Estimated Cost
Storage 1500 USD per Month Snapshot migrated to S3 and size will be reduced up to 8 times 190 USD per Month

 

FOUND THIS USEFUL? SHARE IT

Tag -

AWS RDS

Leave a Reply

Your email address will not be published. Required fields are marked *