Data & AnalyticsDevOps

Enabling Client-Side Encryption for MongoDB

In today’s data-driven world, the security of your data is paramount. MongoDB, a popular NoSQL database, offers robust security features to protect your sensitive information. One of the most powerful security mechanisms MongoDB provides is client-side encryption. This approach allows you to encrypt data on the client side, ensuring that even if unauthorized users gain […]

Data & Analytics

AWS Custom Training Job

In this blog post, we will demonstrate how to perform custom training using the AWS Training Job service with xg-boost on a dataset. We will create our training job in four straightforward steps, enabling us to implement the entire process. By the end of this blog, you will be equipped to apply this technique, including […]

Tushar Verma
Tushar Verma
Read

Big DataData & Analytics

Efficient Data Migration from MongoDB to S3 using PySpark

Data migration is a crucial process for modern organizations looking to harness the power of cloud-based storage and processing. The blog will examine the procedure for transferring information from MongoDB, a well-known NoSQL database, to Amazon S3, an elastic cloud storage solution leveraging PySpark. Moreover, we will focus on handling migrations based on timestamps to […]

Big DataData & AnalyticsDigital Engineering

Spark Structured Streaming

In this blog, I will discuss how Spark structured streaming works and how we can process data as a continuous stream of data. Before we discuss this in detail, let’s try to understand stream processing. In layman’s terms, stream processing is the processing of data in motion or computing data directly as it is produced […]

AdobeAnayticsData & Analytics

Pass digital data layer to Adobe analytics using javascript

Requirement In general, data is passed to Adobe Analytics on page load. Sometimes it is required to load page content without reloading/refreshing the page. For example, if a web application has page navigation based on the Table Of Content and page refresh, and not a requirement, but content should get updated. Solution Without loading a […]

Vijay Kumar
Vijay Kumar
Read

Big DataCloudCloud Managed Services

From Pandas to Pyspark

Recently converted a Python script that relied on Pandas DataFrames to utilize PySpark DataFrames instead. The main goal is to transition data manipulation from the localized context of Pandas to the distributed processing capabilities offered by PySpark. This shift to PySpark DataFrames enables us to enhance scalability and efficiency by harnessing the power of distributed […]

Isha Vason
Isha Vason
Read

Data & AnalyticsSoftware development

Text Extraction from pdf using OCR (Optical Character Recognition ) in Python

Reading text from pdf using OCR Technique (Python) Why OCR (Optical Character Recognition)? We can also use the PyPDF2 python library to get text from PDF. But there is a major problem with this library. – It will not give you a good result if the data in the pdf is not structured. – You […]

AWSCloudData & Analytics

Future Trends and Developments in AWS Graviton: What to Expect

Introduction AWS Graviton has revolutionized cloud computing with its cost-effective and high-performance ARM-based architecture. Designed to provide an alternative to traditional x86 instances, Graviton instances offer significant benefits in terms of cost savings and improved efficiency. Before diving into future trends and developments, let us have a look at AWS Graviton and its features first. […]

AnayticsBig DataCloud

Improving query performance in Snowflake and its related costs

In the previous blog, we understood how to Optimally use Snowflake Warehouse and Tables. Now, we continue the series by diving into Snowflake performance tuning, focusing on how to enhance query performance and manage associated costs in Snowflake cloud services. So let’s continue the blog series, where we will now focus on improving the performance […]

Services