{"id":69941,"date":"2025-03-12T13:32:35","date_gmt":"2025-03-12T08:02:35","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=69941"},"modified":"2025-03-13T22:48:10","modified_gmt":"2025-03-13T17:18:10","slug":"13-must-know-automation-scripts-for-devops-monitoring-logging","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/13-must-know-automation-scripts-for-devops-monitoring-logging\/","title":{"rendered":"13 Must-Know Automation Scripts for DevOps Monitoring &amp; Logging"},"content":{"rendered":"<h3>Introduction<\/h3>\n<p>Monitoring and logging are crucial for maintaining a reliable system. Whether you\u2019re managing <a href=\"https:\/\/www.tothenew.com\/cloud-devops\">cloud infrastructure<\/a>, microservices, servers, or CI\/CD pipelines, automation scripts provide a strong foundation &amp; play a key role in preventing late-night production issues<\/p>\n<p>In this article, we will look at 13 essential automation scripts that every DevOps engineer should know. These scripts will help in tracking system performance, identifying bottlenecks, and streamlining log management, making your workflow more efficient.<\/p>\n<p>This guide is perfect for beginners, but experienced DevOps professionals will also find valuable insights. Let\u2019s dive in and discover how these scripts can enhance your operations with real-world applications!<\/p>\n<h3>Objective<\/h3>\n<p>This blog will provide DevOps engineers with essential automation scripts to enhance system reliability, performance monitoring, and log management. DevOps teams can prevent critical issues, reduce manual effort, and improve operational efficiency by automating key tasks.<\/p>\n<h3>Prerequisites<\/h3>\n<ul>\n<li>Linux Server<\/li>\n<li>Familiarity with Bash &amp; Python Scripting.<\/li>\n<\/ul>\n<h4>1. Monitor Disk Space<\/h4>\n<p>Running out of disk space can cause DB crashes, application\/server crashes, failed build pipelines, or even data loss.<\/p>\n<p><strong>Solution<\/strong>: A bash script checks disk usage in your server and sends an alert if usage exceeds the defined threshold. In this example 80%.<\/p>\n<p>Sample Script:<\/p>\n<pre>#!\/bin\/bash\r\nTHRESHOLD=80\r\ndf -h | awk '{if($5+0 &gt; '$THRESHOLD') print $0}'<\/pre>\n<h4>2. Monitor CPU Usage<\/h4>\n<p>High CPU usage can indicate that the application isn\u2019t scaling properly or that a zombie process is hogging resources.<br \/>\n<strong>Solution<\/strong>:\u00a0This Python script uses\u00a0psutil\u00a0to monitor CPU usages and sends alerts when it exceeds a threshold.<\/p>\n<p>Sample Script:<\/p>\n<pre>import psutil\u00a0 \r\nimport smtplib\u00a0 \r\n\u00a0 \r\nif psutil.cpu_percent(interval=1) &gt; 80:\u00a0 \r\n  \u00a0 print(\"High CPU Usage Alert!!\")<\/pre>\n<h4>3. Automate Log Rotation<\/h4>\n<p>The large logs can quickly fill storage and can become unmanageable. Automating log rotation will help to keep them under control.<br \/>\n<strong>Solution<\/strong>:\u00a0A cron job paired with this Bash script compresses and archives the old logs, while retaining the recent ones.<br \/>\nSample Script:<\/p>\n<pre>#!\/bin\/bash \r\nLOG_DIR=\"\/var\/log\/app\/\" \r\ntar -czf $LOG_DIR\/archive_$(date +%F).tar.gz $LOG_DIR\/*.log \r\nrm $LOG_DIR\/*.log<\/pre>\n<h4>4. Service Uptime Monitoring<\/h4>\n<p>Immediate detection of service downtime is crucial for maintaining application reliability.<br \/>\n<strong>Solution<\/strong>:\u00a0A Python script pings a service endpoint and sends alerts if it\u2019s unreachable.<br \/>\nSample Script:<\/p>\n<pre>import requests\u00a0 \r\ntry:\u00a0 \r\n\u00a0 \u00a0 response = requests.get(\"http:\/\/my-service-url.com\")\u00a0 \r\n\u00a0 \u00a0 if response.status_code != 200:\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 print(\"Service Down Alert!\")\u00a0 \r\nexcept Exception as e:\u00a0 \r\n\u00a0 \u00a0 print(\"Service Unreachable!\")<\/pre>\n<h4>5. Monitor Memory Usage<\/h4>\n<p>Low server memory can lead to application crashes and disruption of services.<br \/>\n<strong>Solution<\/strong>:\u00a0This script checks memory usage and alerts when available memory drops below a specified threshold. In this example 600 MB.<br \/>\nSample Script:<\/p>\n<pre>#!\/bin\/bash\u00a0 \r\nTHRESHOLD=600 # in MB\u00a0 \r\nAVAILABLE=$(free -m | awk '\/Mem\/ {print $7}')\u00a0 \r\nif [ $AVAILABLE -lt $THRESHOLD ]; then\u00a0 \r\n\u00a0 \u00a0 echo \"Low Memory Alert!!\"\u00a0 \r\nfi<\/pre>\n<h4>6. Docker Container Health Check<\/h4>\n<p>Container health is crucial for ensuring the smooth operation of micro-services.<br \/>\n<strong>Solution<\/strong>:\u00a0This script checks the health of running Docker containers.<br \/>\nSample Script:<\/p>\n<pre>#!\/bin\/bash\u00a0 \r\nfor container in $(docker ps --format \"{{.Names}}\"); do\u00a0 \r\n\u00a0 \u00a0 STATUS=$(docker inspect --format '{{.State.Health.Status}}' $container)\u00a0 \r\n\u00a0 \u00a0 echo \"Container: $container - Status: $STATUS\"\u00a0 \r\ndone<\/pre>\n<h4>7. Kubernetes Pod Monitoring<\/h4>\n<p>Pods are the smallest deployable units in Kubernetes. Monitoring their status ensures applications are running smoothly.<br \/>\n<strong>Solution<\/strong>:\u00a0This script uses\u00a0kubectl\u00a0commands to check pod statuses.<br \/>\nSample Script:<\/p>\n<pre>#!\/bin\/bash \r\nkubectl get pods --all-namespaces | grep -v 'Running'<\/pre>\n<h4>8. Application Log Parsing Script<\/h4>\n<p>Log parsing helps in detecting the hidden issues, but manually filtering can be time-consuming.<br \/>\n<strong>Solution<\/strong>:\u00a0A Bash script filters the application logs for ERROR messages or specific patterns.<br \/>\nSample Script:<\/p>\n<pre>#!\/bin\/bash \r\nLOG_FILE=\"\/var\/log\/app.log\" \r\ngrep -i \"error\" $LOG_FILE<\/pre>\n<h4>9. Website Availability Check Script<\/h4>\n<p>What If your website goes offline, it\u2019s critical to detect the outage immediately.<br \/>\n<strong>Solution<\/strong>: A Python script checks if your website is up and alerts you if it\u2019s down.<br \/>\nSample Script:<\/p>\n<pre>import requests\u00a0 \r\ntry:\u00a0 \r\n\u00a0 \u00a0 response = requests.get(\"http:\/\/myexample.com\")\u00a0 \r\n\u00a0 \u00a0 print(\"Website is up!\") if response.status_code == 200 else print(\"Website is down!\")\u00a0 \r\nexcept:\u00a0 \r\n\u00a0 \u00a0 print(\"Website Unreachable!\")<\/pre>\n<h4>10. Script for Cleaning Temp Files<\/h4>\n<p>Temporary files can clutter the filesystem, leading to disk space issues in the server. Cleaning up the temp files is a necessary step.<br \/>\n<strong>Sample Script:<\/strong><\/p>\n<pre>import os\u00a0 \r\nimport shutil\u00a0 \r\nimport time\u00a0 \r\n\u00a0 \r\nTEMP_DIR = '\/tmp'\u00a0 \r\nAGE_THRESHOLD = 7 * 24 * 60 * 60\u00a0 # 7 days in seconds\u00a0 \r\n\u00a0 \r\ndef clean_temp_files():\u00a0 \r\n\u00a0 \u00a0 now = time.time()\u00a0 \r\n\u00a0 \u00a0 for filename in os.listdir(TEMP_DIR):\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 file_path = os.path.join(TEMP_DIR, filename)\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 if os.stat(file_path).st_mtime &lt; now - AGE_THRESHOLD:\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if os.path.isfile(file_path) or os.path.islink(file_path):\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 os.remove(file_path)\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f'Removed file: {file_path}')\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 elif os.path.isdir(file_path):\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 shutil.rmtree(file_path)\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 print(f'Removed directory: {file_path}')\u00a0 \r\nif __name__ == \"__main__\":\u00a0 \r\n\u00a0 \u00a0 clean_temp_files()<\/pre>\n<h4>11. Track DNS Resolution Time<\/h4>\n<p>Monitoring DNS resolution speeds helps troubleshoot slow application performance.<br \/>\n<strong>Sample Script:<\/strong><\/p>\n<pre>#!\/bin\/bash \r\nDNS_SERVER=\"8.8.8.8\" \r\nDOMAIN=\"example.com\" \r\ndig @$DNS_SERVER $DOMAIN +stats | grep \"Query time\"<\/pre>\n<h4>12. Automated Backup Verification Script<\/h4>\n<p>Backups must be verified to ensure they\u2019re available in the event of a system failure.<br \/>\n<strong>Sample Script:<\/strong><\/p>\n<pre>#!\/bin\/bash\u00a0 \r\nBACKUP_DIR=\"\/backups\"\u00a0 \r\nfor file in $BACKUP_DIR\/*.tar.gz; do\u00a0 \r\n\u00a0 \u00a0 if ! tar -tzf $file &gt; \/dev\/null; then\u00a0 \r\n\u00a0 \u00a0 \u00a0 \u00a0 echo \"Corrupted backup file: $file\"\u00a0 \r\n\u00a0 \u00a0 fi\u00a0 \r\ndone<\/pre>\n<h4>13. AWS Cloud Cost Monitoring Script<\/h4>\n<p>Managing cloud costs ensures efficient resource allocation and prevents unexpected billing spikes.<br \/>\n<strong>Sample Script:<\/strong><\/p>\n<pre>#!\/bin\/bash\r\nTHRESHOLD=10\u00a0 # Set cost alert threshold\r\nLOG_FILE=\"\/var\/log\/aws_cost.log\"\r\nSLACK_WEBHOOK_URL=\"https:\/\/hooks.slack.com\/services\/XXXXXXX\"\r\n\r\n# Fetch today's AWS cost\r\nCOST=$(aws ce get-cost-and-usage \\\r\n\u00a0 --time-period Start=$(date +%Y-%m-%d),End=$(date +%Y-%m-%d) \\\r\n\u00a0 --granularity DAILY \\\r\n\u00a0 --metrics \"BlendedCost\" \\\r\n\u00a0 --query 'ResultsByTime[0].Total.BlendedCost.Amount' \\\r\n\u00a0 --output text 2&gt;\/dev\/null)\r\n\r\n# Ensure COST is not empty\/null; if empty, set to zero\r\nCOST=${COST:-0}\r\n\r\n# Validate COST as a numeric value\r\nif ! [[ \"$COST\" =~ ^[0-9]+(\\.[0-9]+)?$ ]]; then\r\n\u00a0 \u00a0 echo \"$(date): ERROR - AWS cost data is invalid: '$COST'\" &gt;&gt; \"$LOG_FILE\"\r\n\u00a0 \u00a0 exit 2\r\nfi\r\n\r\n# Log the cost\r\necho \"$(date): AWS Cost Today: $COST USD\" &gt;&gt; \"$LOG_FILE\"\r\n\r\n# Function to send Slack notification\r\nsend_slack_alert() {\r\n\u00a0 \u00a0 MESSAGE=\"\ud83d\udea8 *AWS Cost Alert!* \ud83d\udea8\\n\\nToday's AWS cost: *\\$${COST} USD* \ud83d\udcb0\\nThreshold: *\\$${THRESHOLD} USD*\\n\\n\ud83d\udd0d Check AWS Cost Explorer for more details.\"\r\n\r\n\u00a0 \u00a0 PAYLOAD=$(jq -n --arg text \"$MESSAGE\" '{text: $text}')\r\n\r\n\u00a0 \u00a0 curl -X POST -H 'Content-type: application\/json' \\\r\n\u00a0 \u00a0 \u00a0 \u00a0 --data \"$PAYLOAD\" \"$SLACK_WEBHOOK_URL\"\r\n}\r\n\r\n# Check if cost exceeds threshold using awk\r\nif awk \"BEGIN {exit !($COST &gt; $THRESHOLD)}\"; then\r\n\u00a0 \u00a0 send_slack_alert\r\nfi<\/pre>\n<p>&nbsp;<\/p>\n<h3>Conclusion<\/h3>\n<p>These scripts will certainly give a solid foundation for improving logging and monitoring in your infrastructure.\u00a0 As it offers a practical solution, these scripts can be further modified as needed.<\/p>\n<p>For more DevOps and automation content, do subscribe to our blogs.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Monitoring and logging are crucial for maintaining a reliable system. Whether you\u2019re managing cloud infrastructure, microservices, servers, or CI\/CD pipelines, automation scripts provide a strong foundation &amp; play a key role in preventing late-night production issues In this article, we will look at 13 essential automation scripts that every DevOps engineer should know. These [&hellip;]<\/p>\n","protected":false},"author":1741,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":114},"categories":[2348],"tags":[1853,1892,260,6882],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69941"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1741"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=69941"}],"version-history":[{"count":12,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69941\/revisions"}],"predecessor-version":[{"id":70462,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/69941\/revisions\/70462"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=69941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=69941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=69941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}