{"id":41968,"date":"2016-11-11T14:47:49","date_gmt":"2016-11-11T09:17:49","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=41968"},"modified":"2017-06-06T16:56:44","modified_gmt":"2017-06-06T11:26:44","slug":"elasticsearch-cluster-with-aws-spot-instances","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/elasticsearch-cluster-with-aws-spot-instances\/","title":{"rendered":"Elasticsearch Cluster with AWS Spot Instances"},"content":{"rendered":"<p><img decoding=\"async\" loading=\"lazy\" class=\" wp-image-42135 aligncenter\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/logo_elasticsearch.png\" alt=\"logo_elasticsearch\" width=\"386\" height=\"245\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/logo_elasticsearch.png 340w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/logo_elasticsearch-300x190.png 300w\" sizes=\"(max-width: 386px) 100vw, 386px\" \/><\/p>\n<p style=\"text-align: justify\">One of the most challenging tasks in any microservices ecosystem is the centralized log management, and there are many open source and paid solutions available in the market. In our ecosystem, we are using ELK stack as it provides scalability and the multitenant-capable full-text search engine that easily integrates with Logstash and Kibana for centralized logging and visualizations.<\/p>\n<p style=\"text-align: justify\">In the initial days, we did the setup of <a title=\"DevOps on AWS\" href=\"http:\/\/www.tothenew.com\/devops-aws\" target=\"_blank\">elasticsearch<\/a> cluster with 2 nodes and both of them were configured as master eligible data nodes to mitigate node failure. Both the nodes were using the default configuration for shards and replica i.e. shard: 5 and replica: 1. This step was perfect for handling small data set i.e. 30-40 GB logs per day, but in our case the log size increased considerably from 30-40GB per day to 200 GB per day in a matter of 2-3 months. As a result, we started facing operational issues such as low disk space, high load average, and performance issues\u00a0with the elasticsearch cluster. With this experience we realized that we were unable to leverage elasticsearch scalability to the optimum. Through this blog, I will be sharing how we can leverage spot instances in the elasticsearch cluster with default settings. It is highly recommended to tweak elasticsearch settings based on your use-case and that can be done only after thorough testing.<\/p>\n<p style=\"text-align: justify\"><strong>Prerequisite:<\/strong>\u00a0To implement highly scalable elasticsearch cluster for logging, you should have the basic understanding of how\u00a0ELK stack works.<\/p>\n<p style=\"text-align: justify\">In order to scale elasticsearch efficiently, it is recommended to have a separate master, data, and client nodes. The main purpose of all the nodes is mentioned below:<\/p>\n<ul style=\"text-align: justify\">\n<li><strong>Master node:<\/strong> It is responsible for cluster management i.e. creating\/deleting index, information about nodes in the cluster, shard allocation, and routing table.<\/li>\n<li><strong>Data node:<\/strong> It is responsible for holding the actual data in the cluster and it handles operations like CRUD, search, and aggregations.<\/li>\n<li><strong>Client node:<\/strong> It is responsible for request routing, handles the search reduce phase and distribute bulk indexing across data nodes.<\/li>\n<\/ul>\n<p style=\"text-align: justify\">For more details click <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/modules-node.html\">here<\/a><\/p>\n<p style=\"text-align: justify\">After going through the blog &#8220;Designing for Scale&#8221; on elastic.co. In our case &#8211; We had to refactor\u00a0Elasticsearch cluster setup as shown below:<\/p>\n<p style=\"text-align: justify\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-42162\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Scalable_Elasticsearch-5.png\" alt=\"Scalable_Elasticsearch (5)\" width=\"813\" height=\"602\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2016\/11\/Scalable_Elasticsearch-5.png 813w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Scalable_Elasticsearch-5-300x222.png 300w, \/blog\/wp-ttn-blog\/uploads\/2016\/11\/Scalable_Elasticsearch-5-624x462.png 624w\" sizes=\"(max-width: 813px) 100vw, 813px\" \/><\/p>\n<p style=\"text-align: justify\"><span style=\"font-weight: 400\">In this setup, we should use private DNS endpoints, for master and client nodes. Using this setup helps us in scaling in\/out data nodes without any changes in the logstash configuration file, as we are using client private DNS endpoints in the config.\u00a0<\/span>Using all the above instance types in on-demand pricing model (master nodes: t2.medium, client nodes: m3.medium and data nodes: m3.large, m4.large, c3.xlarge and c4.xlarge) will incur good monthly AWS bill, in our case, therefore, we started playing with spot instances. The only concern before starting this activity was to ensure we have one copy of data available on the on-demand instance regularly. To achieve this, we had used cluster.routing.allocation.awareness.attributes, that helped in shard routing and allocation on different rack ids.<\/p>\n<p style=\"text-align: justify\"><span style=\"font-weight: 400\">If Elasticsearch is <\/span><i><span style=\"font-weight: 400\">aware<\/span><\/i><span style=\"font-weight: 400\"> of the physical configuration of your hardware, it can ensure that the primary shard and its replica shards are spread across different physical servers, racks, or zones, to minimize the risk of losing all shard copies at the same time.<\/span><\/p>\n<p>\u00a0<span style=\"font-weight: 400\">You can find the\u00a0master, data, and client node configurations below, we had used ep utility (<\/span><a href=\"https:\/\/github.com\/kreuzwerker\/envplate\"><span style=\"font-weight: 400\">envplate utility<\/span><\/a><span style=\"font-weight: 400\">) to replace environment variables in the configuration file at the instance startup.<\/span><\/p>\n<ol style=\"text-align: justify\">\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\"><strong>Install Elasticsearch:<\/strong> You can either create a base image that already has elasticsearch installed or you can use elasticsearch-userdata.sh, script that can be passed into ubuntu userdata. Script will do the following:<\/span>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Install Java<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Download and Install elasticsearch 2.3.5 from official repo<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Install Kopf plugin<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Install ep (envplate utility as discussed above)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Create elasticsearch.yml file with hostname and IP address fields which will be the variables<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Update environment variables in the elasticsearch.yml file<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Start elasticsearch service<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p style=\"text-align: justify\"><span style=\"font-weight: 400\">2. <strong>Setup Master, Data, and Client nodes:<\/strong><\/span><\/p>\n<ul style=\"text-align: justify\">\n<li><span style=\"font-weight: 400\">Master Nodes:<\/span>\n<ol>\n<li><span style=\"font-weight: 400\">Create three master nodes with the help of elasticsearch-master-userdata.sh (<\/span><a href=\"https:\/\/github.com\/neerjaj2\/elasticsearch-cluster-spot\/blob\/master\/elasticsearch-master-userdata.sh\"><span style=\"font-weight: 400\">link<\/span><\/a><span style=\"font-weight: 400\">)<\/span><\/li>\n<li><span style=\"font-weight: 400\">Update all the three private DNS with the private IPs:<\/span><\/li>\n<li>Restart elasticsearch on all the three master nodes<\/li>\n<li><span style=\"font-weight: 400\">Check using: curl master1.ttn.com:9200\/_cat\/health, the cluster should be in a green state.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<ol style=\"text-align: justify\">\n<ul>\n<li style=\"font-weight: 400\">Data Node &#8211; On-demand:\n<ol>\n<li>Create launch configuration with userdata as mentioned in elasticsearch-datanode-ondemand-userdata.sh (<a href=\"https:\/\/github.com\/neerjaj2\/elasticsearch-cluster-spot\/blob\/master\/elasticsearch-datanode-ondemand-userdata.sh\">link<\/a>)<\/li>\n<li>Create Auto Scaling Group with min: 1, desire: 1 and maximum: 1<\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<\/ol>\n<ol style=\"text-align: justify\">\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Data Node &#8211; Spot: I would highly recommend you create spot fleet for launching spot instances. As it takes care of most of the heavy lifting.<\/span>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Create request and maintain <\/span><a href=\"http:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/spot-fleet.html\"><span style=\"font-weight: 400\">spot fleet request<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Choose multiple instance types: for eg: c4.xlarge, c3.xlarge, m4.large, m3.large the more you select more availability you get.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Choose ubuntu 14.04 AMI<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Allocation Strategy: diversified (launch spot instances in multiple launch specifications)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Bidding Strategy: automated (max bid price = ondemand price)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">In the userdata section use elasticsearch-datanode-spot-userdata.sh (<\/span><a href=\"https:\/\/github.com\/neerjaj2\/elasticsearch-cluster-spot\/blob\/master\/elasticsearch-datanode-spot-userdata.sh\"><span style=\"font-weight: 400\">link<\/span><\/a><span style=\"font-weight: 400\">)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Launch spot fleet<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Configure auto scaling based on high CPU\/memory usage<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<\/ol>\n<p style=\"text-align: justify\"><strong>Note: Running production workloads on spot instances is not recommended, as they can terminate any time AWS market price rises above your bid price.<\/strong><\/p>\n<ol style=\"text-align: justify\">\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Client Nodes:<\/span>\n<ol>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Create launch configuration with userdata as mentioned in elasticsearch-clientnode-userdata.sh (<\/span><a href=\"https:\/\/github.com\/neerjaj2\/elasticsearch-cluster-spot\/blob\/master\/elasticsearch-clientnode-userdata.sh\"><span style=\"font-weight: 400\">link<\/span><\/a><span style=\"font-weight: 400\">)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Create Auto Scaling Group with min: 2, desire: 2 and maximum: 2<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Check All is Well: Once you have added all the nodes in the cluster, you can check the status of the cluster using Kopf plugin. (URL:\u00a0<\/span>Master_Public_IP:9200\/_plugins\/kopf)<\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Updating logstash configuration file: Now that we have entire cluster ready and scalable, we can update our logstash configuration file with client nodes IPs\/private DNS.<\/span><\/li>\n<\/ol>\n<\/li>\n<\/ul>\n<\/ol>\n<p style=\"text-align: justify\"><span style=\"font-weight: 400\">With this kind of highly scalable elasticsearch cluster, we are able to handle approximately 200-220 GB logs per day, and this can scale massively by upgrading instance type or an increasing number of nodes. <\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the most challenging tasks in any microservices ecosystem is the centralized log management, and there are many open source and paid solutions available in the market. In our ecosystem, we are using ELK stack as it provides scalability and the multitenant-capable full-text search engine that easily integrates with Logstash and Kibana for centralized [&hellip;]<\/p>\n","protected":false},"author":216,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":38},"categories":[1174,2348,1],"tags":[248,2366,4173,1524,3872],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/41968"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/216"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=41968"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/41968\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=41968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=41968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=41968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}