ZooKeeper Leader-Election simplified

28 / Oct / 2015 by Salil 0 comments

Background

Recently (in a project) we were required to determine the master node from a pool of similar type of nodes. And if master node fails, any other node should take on the responsibility – so that the service remains available.
So, the use-case was something like – Only single node should behave as a master node and it will coordinate with all the worker nodes to process the required tasks.

Clearly, it’s a leader-election recipe. This is supported in Curator/Zookeeper API, but we found it little complex in terms of blocking the threads to claim the leadership for a longer duration. Otherwise, it was not functioning well.

Solution

So, we figured out a very simple way to determine the master node. Here are the steps below:

  • All the nodes will register themselves to a specific ZK path. Let’s say it’s “/my/project/coordinators/${machine-ip-address}“
  • Example: If Node1 (running on IP 127.0.0.1) joins the cluster – the ZK node path will appear like “my/project/coordinators/127.0.0.1/“. We picked IP address, but it’s up to you – whatever convention you like to follow.
  • Write an Algo which determines the leader/master node
    • Every coordinator node will read all the children node names (i.e. list of IPs).
    • Sort the node-names and pick the very first one – Declare it as master node.
    • Below is the code snippet to determine if the node is master node or not
      public static boolean isMasterCoordinatorNode(String nodeId) throws Exception {
              List<String> coordinatorNodes = getCuratorClient().getChildren().forPath("my/project/coordinators");
              if (coordinatorNodes!=null && coordinatorNodes.size()>0)
              {
                  TreeSet<String> set = new TreeSet<String>(coordinatorNodes);
                  String firstNodeId = set.first();
                  if(firstNodeId.equals(nodeId)){
                      return true;
                  }
              }
              return false;
          }
      
      public synchronized static CuratorFramework getCuratorClient(){
              if(_client == null){
                  String zookeeperStr = "127.0.0.1:2181"; // zookeeper address
                  RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
                  _client = CuratorFrameworkFactory.newClient(zookeeperStr, retryPolicy);
                  _client.start();
              }
              return _client;
          }
      
  • Other coordinator nodes (which are not picked as master node) – won’t do anything.
  • Ensure that the registered ZooKeeper nodes are all ephemeral nodes, so that even if the master node goes down, the immediate next (available) node will become the master node.

In our case this worked very well. :-)

Here’s my previous blog as an introduction to Curator framework
Curator Framework for Apache ZooKeeper

FOUND THIS USEFUL? SHARE IT

Leave a comment -