Elasticsearch: Shard Filterting

25 / May / 2016 by Navjot Singh 0 comments

Our cloud DevOps engineers have been using Elasticsearch on production environment for an e-commerce website for quite a while. The website has one admin server to manage activities such as adding new production, managing discounts on various items, fetching reports etc. We came across a requirement where downloading reports from admin server should not put extra load on the Elasticsearch server since we have been using single Elasticsearch cluster for customer-facing application and admin server as well.

Currently, we are using two Elasticsearch nodes in a cluster. Downloading a report fetches data from an Elasticsearch index named “report”. In order to segregate load of downloading reports from the two Elasticsearch nodes, we decided to add one more low configuration node which would host only “report” index.

Scenario: Implement Shard filtering on Elasticsearch node without downtime.

We can tell Elasticsearch to host shards of a specific index on the desired node. This is called Shard filtering or shard allocation filtering.

We followed the below steps to implement Shard filtering:

  1. Configure the third node Elasticsearch node. It can be of low configuration since it will only be used to download reports
  2. Add the below tag in the ‘elasticsearch.yml’ file of the node and do not start Elasticsearch process yet.
    node.tag: admin
    

    We need to add the same tag to the indexes also.
    One tag is needed so that index can allocate its shards to a node which has the same tag.

  3. We want report index to be hosted on the third node so below curl request will tell report index to route only to the node with tag “admin”.
    We also need to tell all other indexes to not to get routed to the node with tag “admin”.
    Before adding exclude tag “admin” in all the indexes, remove index ”report” after taking backup if required:

    DELETE report

    Add exclude tag “admin” in all the indexes:

    PUT _settings {
    "index.routing.allocation.exclude.tag: "admin"
    }

    Now, restore or create report index and add the tag “admin”to it:

    PUT report/_settings {
    "index.routing.allocation.include.tag": "admin"
    }
  4. Now start the third node and perform rolling restarts on the other nodes. We can see using “head” plugin that shards of the report index are only allocated to node 3. We are done.

We, then, verified it by downloading a report from admin server and the CPU utilization of the node 3 was increasing while other nodes’ experienced no impact.

Shard filtering is a very crucial feature provided by Elasticsearch in order to segregate nodes based on our application requirements.  Multiple nodes could be configured to host a specific set of indexes depending upon the application architecture. In a micro-services architecture, a single Elasticsearch cluster could be used to serve different applications.

 

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *