DevOps

From Hot Brokers to S3: Optimizing Kafka Storage with Tiered Storage

Introduction If you’ve ever worked with Kafka, you know the problem: data grows fast. Every click, impression, or event adds up, and before you know it, your Kafka broker’s disks are full. Disk is not very cheap on AWS, and storing everything on expensive broker storage is costly, and scaling up to handle growth feels […]

Data Engineering

How GenAI Is Transforming Data Engineering

Introduction Data engineering, once dominated by manual coding, SQL development, and repetitive operational tasks, is entering a new era. With Generative AI (GenAI), data teams are automating ingestion workflows, accelerating data modeling, writing code faster, improving quality checks, and generating documentation instantly. GenAI isn’t just an add-on—it is fundamentally transforming how modern data platforms are […]

Java/JVM

Migrating MySQL Data to Elasticsearch: A Practical Guide

Introduction Elasticsearch has become a go-to choice for building fast, intelligent search experiences. But what if your source of truth is a relational database like MySQL? In this blog, we’ll walk through how we migrated structured relational data into Elasticsearch using Python — with company_registry as a working example. Why Migrate from MySQL to Elasticsearch? […]

Rajdeep Dabral
Rajdeep Dabral
Read

DevOps

How Amazon MSK Helped Us Stop Babysitting Kafka

Introduction For years, we have used Kafka in the Data Centre, then we moved to AWS and started using EC2 to run Kafka. However, the headaches increased along with our usage. We began to feel as though we were spending more time managing Kafka than creating anything of value due to broker upgrades, Zookeeper problems, […]

Data Engineering

Mastering Data Modeling

As you progress in your journey from business intelligence (BI) development toward data engineering or analytics engineering, one of the core skills you need to focus on is data modeling. Data modeling is the foundation for any data architecture—whether you are building databases, designing ETL pipelines, or creating data warehouses. Without a solid understanding of […]

Data Engineering

Unlocking the Secrets to the Perfect Database Choice

Introduction In today’s data-driven world, the choice of a database can significantly impact the performance, scalability, and maintainability of your application. With so many types of databases available, selecting the right one can be a daunting task. This guide will help you understand the key factors to consider when choosing a database and provide a […]

Sindhura
Sindhura
Read

Data Engineering

Configuring AWS Lambda as a Kafka Producer with SASL_SSL and Kerberos/GSSAPI for Secure Communication

Kafka is a distributed streaming platform designed for real-time data pipelines, stream processing, and data integration. AWS lambda, on the other hand, is a serverless compute service that executes your code in response to events, managing the underlying compute resources for you. In organizations where Kafka plays a central role in streaming and data integration, […]

Avinash Upreti
Avinash Upreti
Read

Data Engineering

Building Efficient Data ETL Pipelines: Key Best Practices [Part-2]

In the first part of ETL data pipelines, we explored the importance of ETL processes, and their core components, and discussed the different types of ETL pipelines. Now, in this second part, we will dive deeper into some of the key challenges faced when implementing data ETL pipelines, outline best practices to optimize these processes […]

Yogesh Kargeti
Yogesh Kargeti
Read

Data Engineering

Building Efficient Data ETL Pipelines: Anatomy of an ETL [PART-1]

In today’s data-driven world, businesses rely on timely, accurate information to make critical decisions. Data pipelines play a vital role in this process, seamlessly fetching, processing, and transferring data to centralized locations like data warehouses. These pipelines ensure the right data is available when needed, allowing organizations to analyze trends, forecast outcomes, and optimize their […]

Porush Goyal
Porush Goyal
Read