Blog posts around Big Data | TO THE NEW Blog

Control M: Your Key to Efficient Data Pipeline Orchestration

In today’s data-driven world, seamless orchestration of data pipelines across hybrid environments is crucial for businesses. Control-M, a powerful workflow orchestration and monitoring tool from BMC Software, emerges as a game-changer in this domain. With its comprehensive architecture and scheduling capabilities, Control-M streamlines complex data workflows, ensuring efficient data processing and delivery. This blog delves […]

by Jagmohan Singh

April 12, 2024

Big Data, Data & Analytics

Power of Incremental Models in DBT: A Deep Dive

In the previous blog, we briefly introduced DBT (Data Build Tool) and the fundamental ways it could change how you analyze and transform your data. We discussed the basics, explored its main components, and established the basis for comprehending its capabilities. DBT (Data Build Tool) is a remarkable data analytics tool that is becoming increasingly […]

by Muazzam Munawwar Sayyed

March 28, 2024

Big Data, Data & Analytics

Getting the Best Out of PostgreSQL

Ensuring everything runs smoothly in handling databases is like an ongoing adventure for folks working with data. PostgreSQL, a widely used and powerful open-source database system, is a go-to choice for many applications. But even in the land of PostgreSQL, making it work at its best isn’t always straightforward. In this journey, we will explore […]

by Prashant Singhal

March 7, 2024

Big Data, Data & Analytics

Simplifying Data Engineering: An Introduction to DBT

Introduction Data is a key asset in today’s business environment, holding great potential for making wise decisions and preserving a competitive edge. However, the road to efficient data management is frequently difficult and time-consuming, especially when dealing with big and varied datasets. In this first blog post of the DBT series, we will introduce dbt, […]

by Muazzam Munawwar Sayyed

October 30, 2023

Big Data, Data & Analytics, DevOps

Enhancing Workflows with Apache Airflow and Docker

In today’s world, handling complex tasks and automating them is crucial. Apache Airflow is a powerful tool that helps with this. It’s like a conductor for tasks, making everything work smoothly. When we use Airflow with Docker, it becomes even better because it’s flexible and can be easily moved around. In this blog, we’ll explain […]

by Bishal Kumar Singh

October 17, 2023

AWS, Big Data

Unlocking the Potential: Kafka Streaming Integration with Apache Spark

In today’s fast-paced digital landscape, businesses thrive or falter based on their ability to harness and make sense of data in real time. Apache Kafka, an open-source distributed event streaming platform, has emerged as a pivotal tool for organizations aiming to excel in the world of data-driven decision-making.In this blog post, we’ll be Implementing Apache […]

by Ashish Gupta

October 12, 2023

Big Data, Data & Analytics, Software development

How to Setup Astro CLI and deploy to Astro (Windows)

Setup: Download the optimal version of Astro for your Windows system from link. Rename the downloaded file to “astro.exe” and save it. Add the file path to environment variables. To check if Astro has been configured correctly, run “astro” command On cmd. After the successful configuration of Astro CLI, you should get a response like […]

by Mayank Gupta

October 10, 2023

Big Data, Cloud, Technology

Snowflake Data Warehouse: A Comprehensive Overview

In the rapidly evolving landscape of data management and analytics, Snowflake has emerged as a powerful cloud-based data platform. Snowflake’s architecture and features make it a preferred choice for businesses looking to optimize data processing, storage, and analytics. In this blog post, we will go through various aspects of Snowflake, covering its architecture, features, security, […]

by Rahul Pupreja

October 8, 2023

Big Data, Data & Analytics, Testing

Spark with Pytest : Shaping the Future of Data Testing

PySpark is an open-source, distributed computing framework that provides an interface for programming Apache Spark with the Python programming language, enabling the processing of large-scale data sets across clusters of computers. PySpark is often used to process and learn from voluminous event data. Apache Spark exposes DataFrames and Datasets API that enables writing very concise […]

by Madhav Khanna

September 29, 2023

Blogs

Tips for writing a blog

Learn how to write a caption