DevOps

Real-World AWS Cost Optimization Strategies for High-Traffic Platforms

Introduction I’ll be honest when I say running a high-traffic production environment on AWS is fun…. until you see the cloud bill. At first, you overprovision a bit of memory “just to be safe.” Containers stay up a little longer than needed. Logs? Oh, we log everything because, you know, one day you might need […]

March 15, 2026

DevOps

Step-by-Step Guide to Build observability into an LLM application

Introduction Large Language Models (LLMs) are transforming the way that users interact with applications, and they introduce observability challenges that require new approaches. Unlike deterministic APIs that return predictable results, LLMs have variable performance, unpredictable outputs, and complex failure modes. Observing these systems effectively means collecting data that captures not just the performance of LLM […]

March 15, 2026

DevOps

Securely Access Private GKE Clusters Using Tinyproxy and Identity-Aware Proxy (IAP)

Introduction Private clusters in Google Kubernetes Engine improve security by preventing public access to the Kubernetes control plane, but this also makes remote management more difficult.This step-by-step guide will walks you through how to configure Tinyproxy on a private bastion host and how to use Identity-Aware Proxy (IAP) to safely access a private GKE cluster […]

March 15, 2026

DevOps

From Logstash to Fluent Bit: How We Streamlined Logging for an Ad Tech Client

Introduction In ad-tech, logs are not “nice to have.” They are the product’s heartbeat. Every impression, every click, every bid request — everything generates logs. Multiply that by millions of requests per minute, and you’re suddenly dealing with millions of events and TB’s of logs per day. That’s exactly where one of our platforms was. […]

March 15, 2026

DevOps

Why Right-Sizing Is Not a One-Time Activity

Introduction If you’ve worked in production long enough, you’ve probably heard this: “Let’s right-size the services and reduce the AWS bill.” So we do it. We check CPU and memory metrics for a week. We reduce task sizes. Costs drop. Everyone’s happy. And then…. six months later, the bill increases again. Nothing “dramatic” changed. No […]

March 15, 2026

DevOps

ECS Fargate at Scale: Lessons from Running Multiple Microservices in Production

Introduction When we started with Amazon ECS on AWS Fargate, it felt simple. No EC2 to manage. No AMIs. No cluster scaling headaches. Then the number of services grew. Working for the ad-tech client from last 5 years and running their workload on ECS Fargate has taught us many things. Different traffic patterns. Different scaling […]

March 15, 2026

DevOps

How to Centralize AWS Monitoring: A Guide to CloudWatch Cross-Account Metrics

It is painfully inefficient to check metrics across a large collection of AWS accounts (development, staging, uat, production, etc.). This is a major time waster, not just a small irritation. In addition to wasting valuable engineering time, you run a much higher risk of missing an alert that could result in a full-blown outage every […]

March 12, 2026

DevOps

How to Make Your Java App Monitoring with JMX Exporter & Prometheus

Introduction If you have a Java application running in Kubernetes, sooner or later you will want to know what’s really going on inside the JVM. And, is heap memory close to exhaustion? Is the garbage collection process busy? Are we slowly moving towards an OOM error? Without oversight, you’re essentially flying blind. In this guide, […]

March 10, 2026

DevOps

DNS Migration Done Right: Lessons from Moving to Route 53

Introduction DNS migrations don’t usually get much attention. They’re invisible when done right and very loud when done wrong. At TO THE NEW, we recently migrated DNS for an ad tech client from NS1 (an IBM Product) to AWS Route 53 as part of their large move to AWS and cost savings. On paper, this […]

February 15, 2026