{"id":55009,"date":"2022-05-19T09:43:46","date_gmt":"2022-05-19T04:13:46","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=55009"},"modified":"2022-05-19T09:43:46","modified_gmt":"2022-05-19T04:13:46","slug":"migration-of-hbase-running-on-emr","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/migration-of-hbase-running-on-emr\/","title":{"rendered":"Migration of Hbase Running on EMR"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">\u00a0<\/span><b>Introduction<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Amazon Elastic Map Reduce is a managed platform. We can run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze large volumes of data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0We can process huge amounts of data for analytics purposes and business intelligence workloads with help of this framework. Amazon Elastic Map Reduce also allows us to transform and move huge amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon DynamoDB.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">HBase is a distributed<\/span><b> database <\/b><span style=\"font-weight: 400;\">which runs on top of Hadoop Distributed File System\u00a0 and\u00a0 provides non-relational database potentials for the Hadoop ecosystem.<\/span><\/p>\n<h3><b>Problem Statement<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Migration of Hbase running on\u00a0 EMR cluster from one region to another.<\/span><\/p>\n<h3><b>Solution Approach<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">We need to move EMR hbase tables\u00a0 to s3 and then import it to the newly created EMR cluster.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-55007\" src=\"\/blog\/wp-ttn-blog\/uploads\/2022\/05\/resized-image-Promo-1.jpeg\" alt=\"\" width=\"850\" height=\"360\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2022\/05\/resized-image-Promo-1.jpeg 850w, \/blog\/wp-ttn-blog\/uploads\/2022\/05\/resized-image-Promo-1-300x127.jpeg 300w, \/blog\/wp-ttn-blog\/uploads\/2022\/05\/resized-image-Promo-1-768x325.jpeg 768w, \/blog\/wp-ttn-blog\/uploads\/2022\/05\/resized-image-Promo-1-624x264.jpeg 624w\" sizes=\"(max-width: 850px) 100vw, 850px\" \/><\/p>\n<h3><b>Steps by Step Procedure<\/b><\/h3>\n<p><b>These are some steps to migrate EMR cluster to another region:-<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 1. Create a snapshot of hbase tables from the current cluster.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0a. Login to the EMR master node via ssh .<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0b. Login as root and switch to hbase user.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Command:<\/span><b> su &#8211; hbase -s \/bin\/bash<\/b><span style=\"font-weight: 400;\">\u00a0 <\/span><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0 <\/span><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0c. Login to hbase shell<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400;\"> \u00a0\u00a0<\/span><span style=\"font-weight: 400;\">Command:<\/span><b> hbase shell<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400;\"> \u00a0\u00a0 <\/span><span style=\"font-weight: 400;\">d. Create snapshots of tables from hbase\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400;\"> \u00a0\u00a0<\/span><span style=\"font-weight: 400;\">Command:<\/span><b> snapshot \u2018tablename\u2019, \u2018snapshot-name\u2019<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 2. Export the snapshots to s3 bucket\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0 <\/span><span style=\"font-weight: 400;\">Command: <\/span><b>hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot $snapshotname -copy-to s3:\/\/bucketname\/folder\/<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a03. Verify the snapshot in s3 bucket.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a04. Now create an EMR cluster to the N.Virginia region with the same configuration as of Oregon.<\/span><\/p>\n<p><b>Note<\/b><span style=\"font-weight: 400;\">: a. Use advanced configuration options to customize more.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 b. Please verify ebs volume size from the old cluster nodes and map it\u00a0 accordingly.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 5. After the cluster is ready please login to hbase shell with hbase user and then copy the snapshot of tables from s3 bucket .<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0<\/span><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0 <\/span><span style=\"font-weight: 400;\">Command: <\/span><b>hbase snapshot export -D hbase.rootdir=s3:\/\/bucket-name\/folder\/ -snapshot snapshotname -copy-to hdfs:\/\/110.x.x.x:8020\/user\/hbase -chuser hbase -chgroup hbase -chmod 700<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 6. Verify the snapshot in hdfs .<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 a. Login to hdfs user<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0 <\/span><span style=\"font-weight: 400;\">Command: <\/span><b>su &#8211; hdfs<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 b. Check snapshot copy in hdfs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Command: <\/span><b>hdfs dfs -ls \/user\/hbase\/.hbase-snapshot\/<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 c. Verify hbase snapshot size .<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Command: <\/span><b>hdfs dfs -du -h \/user\/hbase\/data\/default\/<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 7. Login to hbase shell\u00a0<\/span><\/p>\n<p><b>\u00a0\u00a0\u00a0<\/b><b>\u00a0\u00a0\u00a0 <\/b><span style=\"font-weight: 400;\">Command<\/span><b>: hbase shell<\/b><\/p>\n<ol start=\"8\">\n<li><b><\/b><b> <\/b><span style=\"font-weight: 400;\"> Restore table using snapshot<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 a. Disable table if any of same name<\/span><\/p>\n<p><b>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/b><span style=\"font-weight: 400;\">Command<\/span><b>: disable \u201ctable-name\u201d<\/b><\/p>\n<p><b><\/b><b><\/b><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 b. Restore table<\/span><\/p>\n<p><b>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/b><b>\u00a0\u00a0\u00a0 <\/b><span style=\"font-weight: 400;\">Command<\/span><b>: restore_snapshot \u201csnapshot-name\u201d\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0 \u00a0 \u00a0 \u00a0 c. Enable table<\/span><\/p>\n<p><b>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/b><span style=\"font-weight: 400;\">Command<\/span><b>: enable \u201ctable-name\u201d<\/b><\/p>\n<ol start=\"9\">\n<li><b><\/b><b> <\/b><span style=\"font-weight: 400;\"> Verify the number of rows of new hbase tables and compare them with old tables.<\/span><\/li>\n<\/ol>\n<p><b>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/b><span style=\"font-weight: 400;\">Command<\/span><b>: count \u201ctable-name\u201d<\/b><\/p>\n<h3><b>Debugging<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">We faced the issue while importing big hbase table snapshots (size &gt; 30 GB)\u00a0 from s3.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When we start copying a snapshot , it imports some gb of data then after some point the command fails and the snapshot also gets deleted from the new EMR cluster.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After debugging we got two solutions for this problem:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Use mapper in command which import snapshot from s3<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Check volume attached to EMR nodes and increase the number of nodes if required.<\/span><\/li>\n<\/ol>\n<h3><b>Conclusion<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">We need some downtime in this use case as we need to copy the exact number of objects from hbase tables to be exported in s3 and the same table snapshot will be copied to the new hbase table. So, in order to avoid mismatch of count and size some downtime required.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It took 1 hour to migrate GB of data from EMR Hbase to s3.<\/span><\/p>\n<div class=\"ap-custom-wrapper\"><\/div><!--ap-custom-wrapper-->","protected":false},"excerpt":{"rendered":"<p>\u00a0Introduction Amazon Elastic Map Reduce is a managed platform. We can run big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze large volumes of data. \u00a0We can process huge amounts of data for analytics purposes and business intelligence workloads with help of this framework. Amazon Elastic Map Reduce also [&hellip;]<\/p>\n","protected":false},"author":1454,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":57},"categories":[1174,1395,2348,1],"tags":[4967],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/55009"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1454"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=55009"}],"version-history":[{"count":3,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/55009\/revisions"}],"predecessor-version":[{"id":55012,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/55009\/revisions\/55012"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=55009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=55009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=55009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}