MSP

Running GitLab CI at Scale: Setting Up Kubernetes-Based Runners

Introduction As the project grows and traditional CI runners running on static virtual machines or shared runners often struggle to keep up with increasing workloads. Jobs may queue up, builds start taking longer than expected and developers are left waiting. what’s the solution? So, instead of using traditional executioners we can leverage the GitLab Runner […]

MSP

Chaos Engineering: Simulating Network Latency using AWS FIS

Introduction Modern applications have distributed systems consisting of multiple services, containers, and infrastructure components. While it improves scalability, security and reliability, it also increases the chances of unexpected failures and downtime. Application testing methods majorly focus on application functionality, but they rarely test how systems behave in real-world failures such as instance crashes, network latency, […]

Rauf Khan
Rauf Khan
Read

MSP

Cross-Account RDS Migration Using AWS DMS (Snapshot + CDC Strategy)

Migrating databases between AWS accounts is often necessary for environment isolation, organizational changes, or setting up security boundaries. Basic migrations can be performed using dumps or replication. However, production environments need minimal downtime, consistency guarantees, and controlled cutover procedures. A strong method for migrating databases between accounts is to combine snapshot-based data transfer with Change […]

MSP

Agentic AI in SRE: Rethinking Reliability in the Age of Autonomous Systems

Introduction For years, Site Reliability Engineering (SRE) has been built around a simple mission: keep systems reliable at scale. We measure SLOs, manage error budgets, write runbooks, respond to incidents, and automate toil wherever possible. But even with automation, most SRE work remains fundamentally reactive: Alerts wake us up. We investigate dashboards. We correlate logs […]

MSP

AWS DevOps Guru: Intelligent AIOps for Modern Cloud Observability

Introduction Cloud monitoring has evolved over the years and we have moved from manual static monitoring of thresholds to dynamic anomaly monitoring, AI and ML-based operational tasks.Here AWS DevOps Guru comes into the picture as an Mananged machine learning service in Cloud Operations. AWS DevOps Guru is an AIOps solution that can detect operational anomalies […]

MSP

Agentic AI in SRE: Rethinking Reliability in the Age of Autonomous Systems

Introduction For years, Site Reliability Engineering (SRE) has been built around a simple mission: keep systems reliable at scale. We measure SLOs, manage error budgets, write runbooks, respond to incidents, and automate toil wherever possible. But even with automation, most SRE work remains fundamentally reactive: Alerts wake us up. We investigate dashboards. We correlate logs […]

Aasim Zaidi
Aasim Zaidi
Read

MSP

Containers Lie | A Deep Dive into Docker-Shim and a Real On-Call Fix

In this BLOG I will share an incident that taught me how containers really work under the hood. Production Down – Once I received production website down alert for one of my customer. As I checked the website was giving 502 Initial Checks – I immediately logged in to the production host to investigate. The […]

Rayan Ahmed
Rayan Ahmed
Read

MSP

Implementing Istio Service Mesh in Kubernetes

Introduction As the Kubernetes cluster grows it becomes very difficult and complex to manage the communications between different microservices as the N numbers of services interacts in real time and identifying issues like failed connections, packet loss, unstable connections becomes challenging. Istio Service Mesh provide solutions to these challenges by creating infrastructure layer that handles […]

MSP

Agentic AI: When Artificial Intelligence Stops Responding and Starts Acting

Introduction What is Agentic AI? Agentic AI refers to AI systems that can act as autonomous agents—they don’t just respond to prompts, they decide what to do next, plan, use tools, observe results, and adapt to achieve a goal. Think of it as the shift from: “Answer my question” → “Handle this task end-to-end.” Core […]

Aasim Zaidi
Aasim Zaidi
Read