The lifecycle hooks are the great feature of auto scaling, it helps to control instance launch and termination state within auto-scaling group. I got to know about this hidden feature when I was looking for a solution where I had to perform some automated tasks on the instance before adding/removing it from auto-scaling group. I was struggling with auto-scaling setup as the time required for newly launched instance to display “in:service” status was 5 minutes and the auto-scaling health check value was 3 minutes, due to which the instance launched under auto-scaling gets terminated before the completion of automated task. It leads to an infinitive launch and termination loop. While increasing the health check interval of auto-scaling group may help to get rid off this issue but during the same, if the running instance goes unhealthy, then the application performance will be compromised. Before getting a deep dive into it, let us first understand how auto-scaling works.
1. When a scale up event is triggered, a new instance request is initiated by auto-scaling group. It will take some time to launch an instance with a predefined launch configuration. This state is known as “pending”.
2. Once the instance is available, the lifecycle hooks are invoked and the instance state will be changed to “Pending:Wait”. In this state, we can perform pre-production tasks. The instance will not be passed to ELB until lifecycle hook’s timeout occurs.
3. After a predefined timeout of lifecycle hook, the instance will be attached to ELB and will be ready to serve production traffic. This state is known as “InService”.
4. When a scale down event is triggered, an instance will be chosen to be removed from auto-scaling group. The lifecycle hook will be invoked and the instance will remain in the “Terminating:Wait” state until hook timeouts.
5. The instance will be terminated after completion of timeout interval. This state is known as “termination:proceed”.
I previously mentioned it as a hidden feature because there is no option in the console to set hooks in auto-scaling configuration. We can add hooks only from CLI or API. Here, I am assuming that AWS CLI is configured on your system with full permission on auto-scaling resources.
1. Create SNS Topic: We need an SNS topic to receive all event notification triggered by the hook. When a newly launched instance enters into “launch:wait” state, a notification is triggered to subscribed Email ID.
2. Create IAM Role: The auto-scaling resources need permission to send notification alert via SNS. We have to create a service role with “AutoScaling Notification Access” policy.
3. Create a lifecycle hook: Here, we need ARN of SNS and IAM role which we created in previous steps. If the above steps are executed as described then we are good to go. In order to create lifecycle hook in auto-scaling group, you can run command given below. This will invoke newly launched instance with 60 seconds wait interval.
aws autoscaling put-lifecycle-hook --lifecycle-hook-name --auto-scaling-group-name --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING --notification-target-arn --role-arn --heartbeat-timeout 300
If no error message is reported by above command, that means, the lifecycle hook is implemented in auto-scaling but, to be on the safer side, don’t forget to verify the details of lifecycle hook with the following command:
aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name test_asg --lifecycle-hook-names asg_hook --out json
During scale-up action, when an instance enters in “launch:wait” state, a notification alert is pushed to SNS topic. As mentioned in below snap this notification message contains detailed information of instance including “LifecycleAction Token”. This token ID will be used to modify instance hook attributes.
If we want to extend default timeout value, it can be achieved with the help of below command. We can force instance to stay in the wait state for next 300 seconds. Please make sure the token ID must be same as we received in the previous step.
aws autoscaling record-lifecycle-action-heartbeat --lifecycle-action-token <Action Token ID> --auto-scaling-group-name test_asg --lifecycle-hook-name asg_hook
If the automated job finished before the default timeout value, then we can also force an instance to leave its waiting state using following command:
aws autoscaling complete-lifecycle-action --lifecycle-action-token <Action Token ID> --lifecycle-hook-name asg_hook --autio-scaling-group-name <test_asg> --lifecycle-action-result CONTINUE
I hope this blog will help you to setup and run instances with lifecycle hooks. If you come across any error during the setup, I would suggest to crosscheck your ARN and LifecycleActionToken.