Elasticsearch Migration : Found to AWS EC2

Our DevOps team was using Found for one of our projects in the production environment. We have been facing a problem with found where it’s memory pressure frequently goes up and does not drop down so easily and until the time it remains up the Found was not able to serve the requests. Then, we decided to move to self-hosted Elasticsearch cluster.

Scenario: Migration of Elasticsearch cluster from Found to AWS EC2 server

Below is the step by step procedure to migrate Elasticsearch cluster from Found to AWS EC2:

Stop the indexing requests of the data on Found.
Take the dump of data from Found to AWS S3 using “_snapshot” API provided by Elasticsearch.
[js]#!bin/bash
curl -u : -XPUT ":9200/_snapshot/backup" -d’
{
"type": "s3",
"settings": {
"bucket": "bucket-name",
"region": "ap-southeast-1",
"access_key": "——————————–",
"secret_key": "—————————————————",
"compress": true,
"base_path": "directory_name_inside_the_bucket"
}
}’;
curl -u : -XPUT ":9200/_snapshot/backup/snapshot_prod?wait_for_completion=true";[/js]

The first curl request would register the repository “backup” on the Found where we are telling the Found that this repository would take backup on s3. We would have to provide s3 information in the curl request:
bucket :- Name of S3 bucket where we want to take backup.
region :- Name of the region of the s3 bucket.
access_key :- Access key of AWS account where the bucket is present.
secret_key :- Secret key of AWS account where the bucket is present.
compress :- Set to true if we want to take backup in the compressed format.
base_path :- Name of the folder inside the s3 bucket where the backup is to be taken.

The second curl request would start taking the backup on AWS S3. Ensure the repository name is same (in our case,it is “backup”). “snapshot_prod” is the name of the snapshot which would be stored on S3. ‘wait_for_completion=true” would not allow the command to release terminal until the backup is complete.
Once the data dump is complete, restore the data on Elasticsearch running on EC2 on AWS using “_restore” api provided by Elasticsearch.
[js]curl -XPUT "http://:9200/_snapshot/myrepo" -d'{
"type": "s3",
"settings": {
"bucket": "bucket-name",
"region": "ap-southeast-1",
"access_key": "——————————–",
"secret_key": "—————————————————-",
"compress": true,
"base_path": "directory_name_inside_the_bucket"
}
}’;
curl -XPOST "http://:9200/*/_close";
curl -XPOST "http://:9200/_snapshot/myrepo/snapshot_prod/_restore?wait_for_completion=true";[/js]

The first curl request would register the repository on the Elasticsearch cluster running on AWS EC2. We need to provide the same details under AWS s3 bucket setting which we have provided in step 2 as the data backup is on that s3 bucket.
The second curl request is to close all the existing indexes on Elasticsearch cluster running on AWS EC2. For our case, we have set up new Elasticsearch cluster on AWS EC2, we can optionally skip this step.
The third curl request is “POST” request which would restore the Elasticsearch data from S3 to the Elasticsearch cluster running on AWS EC2.

Note:
1. Ensure the repository name should be same which is “myrepo” in our case.
2. Use the same repository name “snapshot_prod” which we have used in step 2.
The completion step 3 will complete the Migration from Found to Elasticsearch hosted on AWS EC2.

This migration required a downtime depending upon the network and the size of the Elasticsearch data. I was able to migrate from Found to AWS EC2 Elasticsearch in forty minutes with a database size of 5Gb.

It is recommended that before migrating to AWS EC2 Elasticsearch Cluster:

Ensure at least two nodes are up and running in the Elasticsearch cluster.
Use ec2 discovery provided by “cloud-aws” plugin. The plugin should be installed on all the nodes.
Disable swap space on EC2 in case it is enabled.
Set heap size equal to the half of RAM available on EC2.
Use “i” series EC2 instance as it is recommended by AWS for NoSql databases.

There are few things which need to be taken care after the migration is complete:

Ensure regular (hourly or daily) snapshots are configured. Elasticsearch supports deltas of the database to be backed up. The step 2 curl request could be used to configure it after a little modification.
Kopf is one plugin to take and restore the backup.
Put alerts on rejected queue of various modules.
Use Shard filtering in case any non-customer facing application is fetching data from some specific indexes.
For monitoring, Marvel can be used but be careful as it daily creates one index. Ensure that it has been properly taken care off.
Implement Http-auth on Elasticsearch Cluster in case “head” plugin is being used.

Hope this blog helped you migrate Elasticsearch cluster from Found to AWS EC2. I will share more such use cases with you.

Tag

Leave a Reply Cancel reply

Tips for writing a blog

Learn how to write a caption