In today’s data-driven world, the security of your data is paramount. MongoDB, a popular NoSQL database, offers robust security features to protect your sensitive information. One of the most powerful security mechanisms MongoDB provides is client-side encryption. This approach allows you to encrypt data on the client side, ensuring that even if unauthorized users gain […]
In this blog post, we will demonstrate how to perform custom training using the AWS Training Job service with xg-boost on a dataset. We will create our training job in four straightforward steps, enabling us to implement the entire process. By the end of this blog, you will be equipped to apply this technique, including […]
Data migration is a crucial process for modern organizations looking to harness the power of cloud-based storage and processing. The blog will examine the procedure for transferring information from MongoDB, a well-known NoSQL database, to Amazon S3, an elastic cloud storage solution leveraging PySpark. Moreover, we will focus on handling migrations based on timestamps to […]
Big DataData & AnalyticsDigital Engineering
In this blog, I will discuss how Spark structured streaming works and how we can process data as a continuous stream of data. Before we discuss this in detail, let’s try to understand stream processing. In layman’s terms, stream processing is the processing of data in motion or computing data directly as it is produced […]
Requirement In general, data is passed to Adobe Analytics on page load. Sometimes it is required to load page content without reloading/refreshing the page. For example, if a web application has page navigation based on the Table Of Content and page refresh, and not a requirement, but content should get updated. Solution Without loading a […]
Big DataCloudCloud Managed Services
Recently converted a Python script that relied on Pandas DataFrames to utilize PySpark DataFrames instead. The main goal is to transition data manipulation from the localized context of Pandas to the distributed processing capabilities offered by PySpark. This shift to PySpark DataFrames enables us to enhance scalability and efficiency by harnessing the power of distributed […]
Data & AnalyticsSoftware development
Reading text from pdf using OCR Technique (Python) Why OCR (Optical Character Recognition)? We can also use the PyPDF2 python library to get text from PDF. But there is a major problem with this library. – It will not give you a good result if the data in the pdf is not structured. – You […]
Introduction AWS Graviton has revolutionized cloud computing with its cost-effective and high-performance ARM-based architecture. Designed to provide an alternative to traditional x86 instances, Graviton instances offer significant benefits in terms of cost savings and improved efficiency. Before diving into future trends and developments, let us have a look at AWS Graviton and its features first. […]
In the previous blog, we understood how to Optimally use Snowflake Warehouse and Tables. Now, we continue the series by diving into Snowflake performance tuning, focusing on how to enhance query performance and manage associated costs in Snowflake cloud services. So let’s continue the blog series, where we will now focus on improving the performance […]