Introduction In modern data platforms, ETL pipelines are rarely independent. They are deeply interconnected—one pipeline’s output becomes another pipeline’s input. In one of our production projects, we faced a classic orchestration challenge. Pipelines were scheduled based on time rather than actual completion i.e real-time problem with predictable execution time of pipelines. This article will walks […]
Like many data engineers, I’ve spent a good chunk of my time dealing with a problem that sounds simple on paper but is messy in reality: reliably moving data from source systems into an analytics platform. In one of my recent projects, I worked on setting up data integration using Airbyte, and this post is […]
Introduction Data engineering, once dominated by manual coding, SQL development, and repetitive operational tasks, is entering a new era. With Generative AI (GenAI), data teams are automating ingestion workflows, accelerating data modeling, writing code faster, improving quality checks, and generating documentation instantly. GenAI isn’t just an add-on—it is fundamentally transforming how modern data platforms are […]
In the modern data ecosystem, speed and efficiency are paramount. Whether you’re building real-time analytics pipelines or scaling distributed systems, the bottleneck often lies in data serialization and transport. Enter Apache Arrow Flight—a high-performance RPC framework designed to move large datasets efficiently using the Arrow memory format. What is Apache Arrow Flight? Apache Arrow […]
Let me tell you about the moment I realized I’d been overcomplicating things for years. I was working on a pipeline in Snowflake. You know the type — a multi-stage transformation process where a few base tables feed into intermediate tables, some reconciliation happens, and eventually it all lands in a final reporting layer. I’d […]
Initial Thoughts Having spent over a decade working in software development and data engineering, I thought, where is AI right now? Is it capable of eliminating the developer or is there still some time? So, I challenged myself with building an Android app. That was an unfamiliar area for me. While it intrigued me, there […]
The State of Code Reviews in Today’s Development Landscape: In today’s fast-moving world of software development, AI has made remarkable progress. It can write code, debug errors, and even help design architectures. But let’s be honest, we’re not quite at a point where AI can take over the entire development process. Human developers are still […]
Fun fact! Around 80%-90% of the world’s data is unstructured. I was shocked when I read this fact. The unstructured data contains images, emails, PDF files social media posts, and other formats. Even though it is widely present 70% of data is not being used to drive insights and get analytics. As a Data Engineer, […]
Introduction In today’s era, businesses face the challenge of adapting to advanced technologies to stay ahead of their peers. Digital Engineering has emerged as a game changer, which helped in integrating services like automation, data analytics, cloud computing, AI/ML, and IoT into engineering and business processes. Traditionally it is associated with product development but gradually […]