Introduction I’ll be honest when I say running a high-traffic production environment on AWS is fun…. until you see the cloud bill. At first, you overprovision a bit of memory “just to be safe.” Containers stay up a little longer than needed. Logs? Oh, we log everything because, you know, one day you might need it. And cross-AZ...
Introduction In ad-tech, logs are not “nice to have.” They are the product’s heartbeat. Every impression, every click, every bid request — everything generates logs. Multiply that by millions of requests per minute, and you’re suddenly dealing with millions of events and TB’s of logs per day. That’s exactly where one of our...
Introduction If you’ve worked in production long enough, you’ve probably heard this: “Let’s right-size the services and reduce the AWS bill.” So we do it. We check CPU and memory metrics for a week. We reduce task sizes. Costs drop. Everyone’s happy. And then…. six months later, the bill increases again....
It is painfully inefficient to check metrics across a large collection of AWS accounts (development, staging, uat, production, etc.). This is a major time waster, not just a small irritation. In addition to wasting valuable engineering time, you run a much higher risk of missing an alert that could result in a full-blown outage every time...
Introduction Modern applications have distributed systems consisting of multiple services, containers, and infrastructure components. While it improves scalability, security and reliability, it also increases the chances of unexpected failures and downtime. Application testing methods majorly focus on application functionality, but...
Introduction Cloud monitoring has evolved over the years and we have moved from manual static monitoring of thresholds to dynamic anomaly monitoring, AI and ML-based operational tasks.Here AWS DevOps Guru comes into the picture as an Mananged machine learning service in Cloud Operations. AWS DevOps Guru is an AIOps solution that can...
Introduction When you work with AWS infrastructure for some time, you realise that not all problems announce themselves with alerts or outages. Some problems stay quiet, blend into the background, and only reveal themselves later-usually when someone asks a question you can’t answer clearly. This is one such experience from my early...
Real-time multiplayer games are unforgiving. Players don’t care for your flashy server hardware and state of the art network infrastructure, they care about quick response matches, fair competition, and smooth gameplay. When that breaks, they don’t blame latency graphs; they blame the game. At TO THE NEW, we’ve seen this...
Introduction As more organizations move their workloads to the cloud, traditional security models are no longer enough. Modern cloud environments need security that can keep up with changing infrastructure and workloads. The Palo Alto Networks VM-Series Next-Generation Firewall (NGFW) stands out by bringing enterprise-level security...
Introduction When teams start on their DevOps journey, the excitement is real. CI/CD pipelines, faster deployments, cloud-native tools, automation everywhere - it feels like everything is finally going to be smooth. But in reality, the first year of DevOps is rarely smooth. It’s messy, experimental, and full of learning. [caption...
Introduction Reducing cloud costs is always the top priority and biggest headache for Devops Engineers, especially when using managed AWS services like ECS Fargate. For one of our Ad-Tech clients at TO THE NEW, we were already utilising Fargate Spot to reduce the ECS bill significantly. But we found that we could save even more money if...
Introduction If you’ve ever worked with Kafka, you know the problem: data grows fast. Every click, impression, or event adds up, and before you know it, your Kafka broker's disks are full. Disk is not very cheap on AWS, and storing everything on expensive broker storage is costly, and scaling up to handle growth feels like throwing...