Introduction Apache Kafka has become the backbone of modern event-driven architectures, powering everything from micro services communication to real-time analytics and stream processing. As Kafka deployments grow, developers and operators frequently need answers to questions such as: Are messages reaching Kafka successfully? Which partition received a message? Are consumers keeping up with producers? How much […]
Introduction Managing IAM (Identity and Access Management) compliance manually is one of those tasks that sounds simple but quietly consumes hours every week. Someone has to read the daily report, identify non-compliant users, send individual emails, track who responded, follow up again, and eventually rotate keys manually for users who never got around to it. […]
If you’ve ever had to jump between six different AWS accounts just to figure out why one Lambda function is behaving oddly – you already know the pain. Multi-account AWS environments are great for security and governance, but they can turn basic monitoring into a logistical nightmare. The good news? AWS gives you everything you […]
Introduction Running containers on Kubernetes feels like putting your application inside a black box. Dozens of pods start and stop. Services talk to each other across namespaces. Traffic shifts, nodes drain, and somewhere in that complexity a latency spike quietly breaks your SLO. Without proper observability, you are flying blind — reacting to symptoms instead […]
Introduction In ad-tech, logs are not “nice to have.” They are the product’s heartbeat. Every impression, every click, every bid request — everything generates logs. Multiply that by millions of requests per minute, and you’re suddenly dealing with millions of events and TB’s of logs per day. That’s exactly where one of our platforms was. […]
When we run Elasticsearch in production, one of the common issues is imbalance in “shards”. There may be one node in the cluster that is out of disk space, while a few nodes with no shards on them. For example, here is a node with all the shards: Node Shards Disk Used Disk % Free […]
It is painfully inefficient to check metrics across a large collection of AWS accounts (development, staging, uat, production, etc.). This is a major time waster, not just a small irritation. In addition to wasting valuable engineering time, you run a much higher risk of missing an alert that could result in a full-blown outage every […]
Introduction If you have a Java application running in Kubernetes, sooner or later you will want to know what’s really going on inside the JVM. And, is heap memory close to exhaustion? Is the garbage collection process busy? Are we slowly moving towards an OOM error? Without oversight, you’re essentially flying blind. In this guide, […]
What is AWS Cloudwatch synthetics?AWS synthetics is a tool powered by AWS Cloudwatch which allows you to create and manage canaries. It is a real time monitoring tool which helps you to detect problems by mimicking a real user behaviour. What are canaries? Canary in the context of AWS cloudwatch is a small script that […]