{"id":33850,"date":"2016-04-30T22:04:03","date_gmt":"2016-04-30T16:34:03","guid":{"rendered":"http:\/\/www.tothenew.com\/blog\/?p=33850"},"modified":"2016-12-19T15:02:53","modified_gmt":"2016-12-19T09:32:53","slug":"monitor-aws-ecs-agent-automatically-restart-agent-on-failure","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/monitor-aws-ecs-agent-automatically-restart-agent-on-failure\/","title":{"rendered":"Monitor AWS ECS Agent &amp; Automatically Restart Agent on Failure"},"content":{"rendered":"<p style=\"text-align: justify;\"><strong>Amazon EC2 Container Service<\/strong> is a container management service that makes it easy to <a title=\"docker continuous integration\" href=\"http:\/\/www.tothenew.com\/devops-chef-puppet-docker\">manage docker<\/a> containers on EC2 instances. AWS ECS you can create task definition to define container configuration like memory, cpu, environment variables, mount point and services to scale docker containers.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-33889 aligncenter\" src=\"\/blog\/wp-ttn-blog\/uploads\/2016\/04\/integrations_awsecs@4x-500x500.png\" alt=\"AWS ECS Agent Monitoring\" width=\"355\" height=\"355\" \/><\/p>\n<p style=\"text-align: justify;\"><strong>Use Case:<\/strong> In one of our project we setup complete QA environment on AWS ECS and after few days we observed ECS agent gets frequently disconnected with the AWS ECS service. As a result AWS ECS service is unable to communicate with ECS agent resulting in no more schedulding and unable to get any status of the existing containers.<\/p>\n<p><span style=\"color: #ff0000;\">Note: We are using AWS ECS Optimized AMI i.e. Amazon Linux AMI, if you are using other OS AMI few steps may change i.e. install aws and getting metadata.<\/span><\/p>\n<p><span style=\"color: #ff9900;\"><strong style=\"line-height: 1.71429; font-size: 1rem;\">Steps to setup monitoring script on ECS nodes:<\/strong><\/span><\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">1. Setup SNS topic for recieving notifications<\/strong><\/p>\n<p style=\"text-align: justify;\">On the AWS console create sns topic and in the subscriber add notification email id, confirm the subscription you recieved from the SNS service.<\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">2. Install AWS CLI<\/strong><\/p>\n<p>Our script will use AWS CLI to query AWS to find container instance arn and agent status using awscli ecs command option.<\/p>\n<p>[js]yum install -y aws-cli[\/js]<\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">3. Setup IAM policies for SNS and ECS<\/strong><\/p>\n<p style=\"text-align: justify;\">a. <strong>AWS SNS IAM Policy:<\/strong> The below mentioned policy will allow IAM instance role to publish message to the SNS topic we created earlier. This will help us in getting notifications for agent failure.<\/p>\n<p>[js]<br \/>\n{<br \/>\n\t    &quot;Version&quot;: &quot;2012-10-17&quot;,<br \/>\n\t    &quot;Statement&quot;: [<br \/>\n\t        {<br \/>\n\t            &quot;Sid&quot;: &quot;Stmt1460976768000&quot;,<br \/>\n\t            &quot;Effect&quot;: &quot;Allow&quot;,<br \/>\n\t            &quot;Action&quot;: [<br \/>\n\t                &quot;sns:GetEndpointAttributes&quot;,<br \/>\n\t                &quot;sns:GetPlatformApplicationAttributes&quot;,<br \/>\n\t                &quot;sns:GetSubscriptionAttributes&quot;,<br \/>\n\t                &quot;sns:GetTopicAttributes&quot;,<br \/>\n\t                &quot;sns:ListEndpointsByPlatformApplication&quot;,<br \/>\n\t                &quot;sns:ListPlatformApplications&quot;,<br \/>\n\t                &quot;sns:ListSubscriptions&quot;,<br \/>\n\t                &quot;sns:ListSubscriptionsByTopic&quot;,<br \/>\n\t                &quot;sns:ListTopics&quot;,<br \/>\n\t                &quot;sns:Publish&quot;<br \/>\n\t            ],<br \/>\n\t            &quot;Resource&quot;: [<br \/>\n\t                &quot;arn:aws:sns:ap-southeast-1:&lt;aws-account-id&gt;:&lt;topic-name&gt;&quot;<br \/>\n\t            ]<br \/>\n\t        }<br \/>\n\t    ]<br \/>\n\t}<br \/>\n[\/js]<\/p>\n<p style=\"text-align: justify;\">b. <strong>AWS ECS IAM Policy:<\/strong> The below mentioned IAM policy will allow IAM instance role to query AWS ECS api to list container instances and check agent connectivity status.<\/p>\n<p>[js]<br \/>\n{<br \/>\n    &quot;Version&quot;: &quot;2012-10-17&quot;,<br \/>\n    &quot;Statement&quot;: [<br \/>\n        {<br \/>\n            &quot;Sid&quot;: &quot;Stmt1460960788000&quot;,<br \/>\n            &quot;Effect&quot;: &quot;Allow&quot;,<br \/>\n            &quot;Action&quot;: [<br \/>\n                &quot;ecs:DescribeClusters&quot;,<br \/>\n                &quot;ecs:DescribeContainerInstances&quot;,<br \/>\n                &quot;ecs:DescribeServices&quot;,<br \/>\n                &quot;ecs:DescribeTaskDefinition&quot;,<br \/>\n                &quot;ecs:DescribeTasks&quot;,<br \/>\n                &quot;ecs:DiscoverPollEndpoint&quot;,<br \/>\n                &quot;ecs:ListClusters&quot;,<br \/>\n                &quot;ecs:ListContainerInstances&quot;,<br \/>\n                &quot;ecs:ListServices&quot;,<br \/>\n                &quot;ecs:ListTaskDefinitionFamilies&quot;,<br \/>\n                &quot;ecs:ListTaskDefinitions&quot;,<br \/>\n                &quot;ecs:ListTasks&quot;,<br \/>\n                &quot;ecs:Poll&quot;<br \/>\n            ],<br \/>\n            &quot;Resource&quot;: [<br \/>\n                &quot;arn:aws:ecs:ap-southeast-1:&lt;aws-account-id&gt;:cluster\/&lt;cluster-name&gt;&quot;<br \/>\n            ]<br \/>\n        }<br \/>\n    ]<br \/>\n}<br \/>\n[\/js]<\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">4. Monitoring Script<\/strong><\/p>\n<p style=\"text-align: justify;\">The below mentioned script will check for ECS agent connectivity with the ECS service, it first extract all the container instances arns, instance id (using metadata). It will then check for each container instance arn for its current status check weather its on the same instance. If current instance ECS agent is donnected it will trigger a notification and restart ecs service on the instance.<\/p>\n<p>[js]<br \/>\n#!\/bin\/bash<br \/>\n# Sourcing the ecs.config file for using the cluster name<br \/>\nsource \/etc\/ecs\/ecs.config<br \/>\nCONTAINERS_ID=$(aws ecs list-container-instances &#8211;cluster $ECS_CLUSTER &#8211;output text &#8211;query &#8216;containerInstanceArns&#8217;)<br \/>\nINSTANCE_ID=$(curl<br \/>\nDATE=$(date +%Y-%m-%d-%H:%M)<br \/>\nTOPIC=&quot;arn:aws:sns:ap-southeast-1:&lt;aws-account-id&gt;:&lt;topic-name&gt;&quot;<br \/>\nfor container in $CONTAINERS_ID<br \/>\ndo<br \/>\nSTATUS=$(aws ecs describe-container-instances &#8211;container-instances $container &#8211;cluster $ECS_CLUSTER &#8211;output json &#8211;query &#8216;containerInstances[0].agentConnected&#8217;)<br \/>\nCHECK_INSTANCE_ID=$(aws ecs describe-container-instances &#8211;container-instances $container &#8211;cluster $ECS_CLUSTER &#8211;output text &#8211;query &#8216;containerInstances[0].ec2InstanceId&#8217;)<br \/>\nif [ $INSTANCE_ID == $CHECK_INSTANCE_ID ]<br \/>\nthen<br \/>\nif [ $STATUS == &quot;false&quot; ]<br \/>\nthen<br \/>\necho &quot;Agent Disconnected&quot; $DATE &amp;gt;&amp;gt; \/var\/log\/script.log<br \/>\naws sns publish &#8211;message &quot;AWS ECS Agent Failed $INSTANCE_ID $DATE&quot; &#8211;topic $TOPIC<br \/>\nsudo stop ecs<br \/>\nsudo start ecs<br \/>\nelse<br \/>\necho &quot;Agent Connected&quot; $DATE &amp;gt;&amp;gt; \/var\/log\/script.log<br \/>\nfi<br \/>\nfi<br \/>\ndone[\/js]<\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">5. Setup cron to run every 5 minutes<\/strong><\/p>\n<p style=\"text-align: justify;\">After monitoring QA environment for more than 1 week, we found ECS agent gets disconnected almost twice daily, so I choose to setup cronjob to run every 5 minutes and writing error logs to \/var\/log\/monitor-agent-logs.txt<\/p>\n<p>[js]*\/5 * * * * bash \/home\/ec2-user\/monitor_agent.sh 2&amp;&gt;1 \/var\/log\/monitor-agent-logs.txt[\/js]<\/p>\n<p><strong style=\"line-height: 1.71429; font-size: 1rem;\">6. Create AMI and Update ECS Auto Scaling Groups Launch Configuration<\/strong><\/p>\n<p style=\"text-align: justify;\">Once you create a AMI of running instance, copy the existing launch configuration (i.e. created by AWS ECS Cloudformation Stack), update the AMI and create new launch configuration. Update the auto scaling group to use newly created launch configuration.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Amazon EC2 Container Service is a container management service that makes it easy to manage docker containers on EC2 instances. AWS ECS you can create task definition to define container configuration like memory, cpu, environment variables, mount point and services to scale docker containers. Use Case: In one of our project we setup complete QA [&hellip;]<\/p>\n","protected":false},"author":216,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":52},"categories":[1174,2348],"tags":[3273,2594,3272],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/33850"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/216"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=33850"}],"version-history":[{"count":0,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/33850\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=33850"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=33850"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=33850"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}