{"id":77906,"date":"2026-03-15T08:40:28","date_gmt":"2026-03-15T03:10:28","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=77906"},"modified":"2026-03-23T21:57:47","modified_gmt":"2026-03-23T16:27:47","slug":"rolling-node-replacement-the-safest-way-to-upgrade-kubernetes","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/rolling-node-replacement-the-safest-way-to-upgrade-kubernetes\/","title":{"rendered":"Rolling Node Replacement: The Safest Way to Upgrade Kubernetes"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p><strong>What if upgrading your Kubernetes cluster required no downtime at all?<\/strong><\/p>\n<p>Imagine if you could upgrade your Kubernetes cluster and keep everything running smoothly, with zero downtime. Sounds pretty great, right? A lot of teams worry that upgrading will mean their apps go offline, but with solid planning, it&#8217;s actually possible to have safe and totally disruption-free upgrades.<\/p>\n<p><strong>Kubernetes upgrades<\/strong> are basically swapping out your old nodes or cluster version to grab security patches, better performance, and support for newer APIs. Staying up-to-date matters \u2014 old nodes open the door to vulnerabilities, outdated features, and flaky workloads.<\/p>\n<p>Here&#8217;s what you&#8217;ll get from this blog:<\/p>\n<ul>\n<li>What a Kubernetes node upgrade actually is<\/li>\n<li>Why upgrades matter in production<\/li>\n<li>How pros handle upgrades, step-by-step<\/li>\n<li>Upgrading clusters <strong>without node groups or Karpenter<\/strong><\/li>\n<li>Tips for true zero-downtime upgrades<\/li>\n<\/ul>\n<h2>What Is a Kubernetes Node Upgrade?<\/h2>\n<p>So, what is a Kubernetes node upgrade? It&#8217;s about replacing old worker nodes with ones running the latest OS image, Kubernetes version, or security fixes. Instead of poking at nodes in place, production setups use a <strong>rolling replacement:<\/strong> add new nodes, shift workloads, and remove old nodes. This keeps your apps up and running through the whole upgrade.<\/p>\n<p><strong>Why upgrade?<\/strong><\/p>\n<ul>\n<li>Patch security holes<\/li>\n<li>Avoid broken APIs<\/li>\n<li>Boost performance and reliability<\/li>\n<li>Stay compatible with tools and add-ons<\/li>\n<li>Keep your vendor support intact<\/li>\n<\/ul>\n<p>Skipping upgrades? That just sets you up for headaches down the line.<\/p>\n<h2>Upgrade Architecture Flow<\/h2>\n<p>Let&#8217;s look at the upgrade flow you&#8217;d follow in a real production environment: EKS Upgrade Flow<\/p>\n<div id=\"attachment_78948\" style=\"width: 667px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-78948\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-78948\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3.png\" alt=\"eks_upgrade_flow\" width=\"657\" height=\"503\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3.png 1360w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3-300x229.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3-1024x783.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3-768x587.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-3-624x477.png 624w\" sizes=\"(max-width: 657px) 100vw, 657px\" \/><p id=\"caption-attachment-78948\" class=\"wp-caption-text\">eks_upgrade_flow<\/p><\/div>\n<h2>Pre-Upgrade Checklist<\/h2>\n<h3>1. Verify Cluster Health<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl get nodes<br \/>\nkubectl get pods -A<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>All nodes must be ready.<\/p>\n<h3>2. Spot Deprecated APIs (tools like Pluto help)<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">pluto detect -A<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If Pluto reports deprecated APIs, fix them before upgrading.<\/p>\n<h3>3. Write Down Your Cluster&#8217;s Details<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">aws eks describe-cluster &#8211;name &lt;cluster&gt;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Record:<\/p>\n<ul>\n<li>Endpoint<\/li>\n<li>Certificate<\/li>\n<li>CIDR<\/li>\n<li>Cluster name<\/li>\n<\/ul>\n<div id=\"attachment_78949\" style=\"width: 617px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-78949\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-78949\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4.png\" alt=\"pre_upgrade_checklist\" width=\"607\" height=\"250\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4.png 1360w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4-300x124.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4-1024x422.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4-768x316.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-4-624x257.png 624w\" sizes=\"(max-width: 607px) 100vw, 607px\" \/><p id=\"caption-attachment-78949\" class=\"wp-caption-text\">pre_upgrade_checklist<\/p><\/div>\n<h2>Universal Upgrade Method (Works Everywhere)<\/h2>\n<p>This method works for:<\/p>\n<ul>\n<li>Managed node groups<\/li>\n<li>Self-managed nodes<\/li>\n<li>Bare-metal clusters<\/li>\n<li>Clusters without autoscalers<\/li>\n<\/ul>\n<h3>Step 1 \u2013 Add New Nodes<\/h3>\n<p>Create new nodes using the updated image\/template.<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl get nodes<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Wait until they show <strong>Ready<\/strong>.<\/p>\n<h3>Step 2 \u2013 Stop Scheduling on Old Node<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl cordon &lt;node&gt;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Step 3 \u2013 Validate New Nodes<\/h3>\n<p>Restart one deployment:<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl rollout restart deployment &lt;app&gt;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If pods start successfully \u2192 continue.<\/p>\n<h3>Step 4 \u2013 Validate Workloads Before Draining (Critical Step)<\/h3>\n<p>Check where pods are running:<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl get pods -o wide<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Ensure:<\/p>\n<ul>\n<li>Pods are running on new nodes<\/li>\n<li>All replicas are healthy<\/li>\n<li>No pods are pending<\/li>\n<li>Applications are accessible<\/li>\n<\/ul>\n<p><strong>Never drain until workloads are confirmed healthy on new nodes.<\/strong><\/p>\n<h3>Step 5 \u2013 Drain the Old Node<\/h3>\n<p>Now safely evict pods:<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl drain &lt;node&gt; &#8211;ignore-daemonsets &#8211;delete-emptydir-data<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This command:<\/p>\n<ul>\n<li>Evicts running pods<\/li>\n<li>Reschedules them on available nodes<\/li>\n<li>Skips DaemonSets (CNI, kube-proxy, etc.)<\/li>\n<\/ul>\n<p>Evict pods so they get rescheduled on available nodes. DaemonSets (like CNI and kube-proxy) aren&#8217;t touched. Wait for the drain command to finish.<\/p>\n<h3>Step 6 \u2013 Validate After Drain<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl get pods -A<br \/>\nkubectl get events &#8211;sort-by=.metadata.creationTimestamp<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Confirm:<\/p>\n<ul>\n<li>No CrashLoopBackOff pods<\/li>\n<li>No scheduling failures<\/li>\n<li>No Pending workloads<\/li>\n<\/ul>\n<h3>Step 7 \u2013 Remove Old Node<\/h3>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">kubectl delete node &lt;node&gt;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Terminate the underlying VM if required.<\/p>\n<h3>Step 8 \u2013 Repeat<\/h3>\n<p>Repeat for the remaining nodes until all are upgraded.<\/p>\n<h2>Golden Rule for Production Upgrades<\/h2>\n<p><strong>Add capacity \u2192 Validate workloads \u2192 Drain \u2192 Validate again \u2192 Delete node<\/strong><\/p>\n<p>Skipping validation is the most common cause of upgrade-related downtime.<\/p>\n<h2>Zero Downtime Requirements<\/h2>\n<p>To avoid downtime during the upgrade:<\/p>\n<ul>\n<li>Minimum 2 replicas per deployment<\/li>\n<li>Readiness probes configured<\/li>\n<li>PodDisruptionBudget enabled<\/li>\n<li>Extra cluster capacity available<\/li>\n<\/ul>\n<h2>Common Mistakes to Avoid<\/h2>\n<ul>\n<li>Upgrading nodes before the control plane<\/li>\n<li>Draining all nodes together<\/li>\n<li>Ignoring deprecated APIs<\/li>\n<li>No spare capacity<\/li>\n<li>No rollback plan<\/li>\n<\/ul>\n<h2>Rollback Strategy<\/h2>\n<p>If something breaks:<\/p>\n<ul>\n<li>Create nodes with the previous image<\/li>\n<li>Cordon new nodes<\/li>\n<li>Drain new nodes<\/li>\n<li>Delete new nodes<\/li>\n<\/ul>\n<p>This safely restores the previous state.<\/p>\n<div id=\"attachment_78950\" style=\"width: 520px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-78950\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-78950\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5.png\" alt=\"zero_downtime_requirements_vs_mistakes\" width=\"510\" height=\"240\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5.png 1360w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5-300x141.png 300w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5-1024x482.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5-768x361.png 768w, \/blog\/wp-ttn-blog\/uploads\/2026\/03\/svgviewer-png-output-5-624x294.png 624w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/><p id=\"caption-attachment-78950\" class=\"wp-caption-text\">zero_downtime_requirements_vs_mistakes<\/p><\/div>\n<h2>Conclusion<\/h2>\n<p>In the end, Kubernetes upgrades shouldn&#8217;t keep you up at night. With rolling replacements, you can upgrade with confidence and no downtime, whether you rely on node groups, autoscalers, or manage infrastructure the old-fashioned way.<\/p>\n<p><strong>Key takeaway:<\/strong> Always add new nodes, migrate workloads, and then delete the old ones. Never try upgrading everything all at once.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction What if upgrading your Kubernetes cluster required no downtime at all? Imagine if you could upgrade your Kubernetes cluster and keep everything running smoothly, with zero downtime. Sounds pretty great, right? A lot of teams worry that upgrading will mean their apps go offline, but with solid planning, it&#8217;s actually possible to have safe [&hellip;]<\/p>\n","protected":false},"author":1638,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":43},"categories":[2348],"tags":[1892,3965,7723,6078],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77906"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1638"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=77906"}],"version-history":[{"count":10,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77906\/revisions"}],"predecessor-version":[{"id":79036,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/77906\/revisions\/79036"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=77906"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=77906"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=77906"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}