Blog posts around Big Data | TO THE NEW Blog

Amazon Redshift: A Comprehensive Overview

Introduction In today’s data-centric world, making informed decisions is vital for businesses. To support this, Amazon Web Services (AWS) offers a robust data warehousing solution known as Amazon Redshift. Redshift is designed to help organizations efficiently manage and analyze their data, providing valuable insights for strategic decisions. In this blog post, we will delve into […]

Shubham Thakur September 19, 2023

Read

Big Data Data & Analytics

Efficient Data Migration from MongoDB to S3 using PySpark

Data migration is a crucial process for modern organizations looking to harness the power of cloud-based storage and processing. The blog will examine the procedure for transferring information from MongoDB, a well-known NoSQL database, to Amazon S3, an elastic cloud storage solution leveraging PySpark. Moreover, we will focus on handling migrations based on timestamps to […]

Bishal Kumar Singh September 18, 2023

Read

Big Data Cloud

Containerized Schema Evolution: Confluent to AWS ECS Migration

In today’s data-driven world, the efficient management of data schemas is critical. The Confluent Platform Schema Registry has long been a trusted solution for ensuring schema compatibility in Apache Kafka environments. However, as cloud services gain prominence, migrating your Confluent Schema Registry to AWS ECS (Elastic Container Service) offers numerous advantages in terms of scalability, […]

Sukhpreet Singh September 18, 2023

Read

Big Data

Our experience through Snowflake DATA CLOUD WORLD TOUR The World of Data, Apps and AI 2023 in Delhi

I had the opportunity to experience the Data Cloud World Tour, and it was all about collaborating with data in unimaginable ways. I joined the event with Suprakash Maity, Prashant Singhal, Sushant, and Vikramjeet along with leaders, to learn about the latest capabilities of the Data Cloud and to hear directly from our customers about […]

Dheeraj Gupta September 11, 2023

Read

Big Data

Data Quality with PyDeequ: A Comprehensive Guide

Inadequate data quality can adversely affect both machine learning models and the decision-making process within a business. Unaddressed data errors can result in lasting repercussions, manifesting as blemishes and jolts. It is imperative in today’s landscape to implement automated tools for monitoring data quality, enabling the timely identification and resolution of issues. This proactive approach […]

Prashant Singhal September 11, 2023

Read

Big Data Data & Analytics Digital Engineering

Spark Structured Streaming

In this blog, I will discuss how Spark structured streaming works and how we can process data as a continuous stream of data. Before we discuss this in detail, let’s try to understand stream processing. In layman’s terms, stream processing is the processing of data in motion or computing data directly as it is produced […]

Ravindra Jain August 31, 2023

Read

Big Data Cloud Cloud Managed Services

From Pandas to Pyspark

Recently converted a Python script that relied on Pandas DataFrames to utilize PySpark DataFrames instead. The main goal is to transition data manipulation from the localized context of Pandas to the distributed processing capabilities offered by PySpark. This shift to PySpark DataFrames enables us to enhance scalability and efficiency by harnessing the power of distributed […]

Isha Vason August 30, 2023

Read

AWS Big Data Cloud

Mastering Schema Management: Transitioning from Confluent to AWS Glue Schema Registry

Introduction In the dynamic realm of data integration, schema registries are crucial, ensuring data coherence, harmony, and structure. Amidst notable contenders, Confluent Schema Registry and AWS Glue Schema Registry shine as prime choices for efficient schema management. With businesses aiming to enhance operations within the extensive AWS ecosystem, the migration from Confluent to AWS Glue […]

Sukhpreet Singh August 25, 2023

Read

Big Data

No Code Data Ingestion Framework Using Apache-Flink

The conveyance of data from many sources to a storage medium where it may be accessed, utilized, and analyzed by an organization is known as data ingestion. Typically, the destination is a data warehouse, data mart, database, or document storage. Sources can include RDBMS such as MySQL, Oracle, and Postgres. The data ingestion layer serves […]

Vikas Duvedi June 27, 2023

Read

Tips for writing a blog

Learn how to write a caption