Big Data, Technology

DataSafe – A Data Archival Tool

#fame is India's first (and now the biggest) live-streaming app on IOS and Android platforms. This app allows people to create their own beam and go live immediately, or book a slot for future. As time passed, the operational databases of #fame kept on increasing at a great speed. As a result, the disk space utilization of database server...

by Rohan Kalra
Tag: Apache Spark
16-Aug-2016

Big Data, Technology

Yarn Cluster Optimization for Spark Jobs

Overview This blog is related to the Yarn Cluster Optimizations for executing the spark jobs on yarn cluster. In this blog post I will be discussing about the YARN Optimizations for the efficient utilization of available resources to execute the spark jobs on yarn cluster. These optimization configurations could be done either in the...

by Rohit Verma
Tag: Apache Spark
10-Nov-2015

Big Data

Spark 1O3 – Understanding Spark Internals

In this post, I will present a technical “deep-dive” into Spark internals, including RDD and Shared Variables. If you want to know more about Spark and Spark setup in a single node, please refer previous post of Spark series, including Spark 1O1 and Spark 1O2. Resilient Distributed Datasets (RDD) - An RDD in is primary abstraction...

by Surendra Pratap Singh
Tag: Apache Spark
13-Feb-2015

Big Data

Spark 1o2 – “Hello World”

This is the second blog of the Spark series. This blog post include setup of Spark environment followed by a small word count program. The idea behind the blog is to get hands on in Spark setup and running simple program on Spark. If you want to know more about Spark history and it's comparison with Hadoop, please refer Spark 1o1.  ...

by Surendra Pratap Singh
Tag: Apache Spark
21-Jan-2015

Big Data

Spark 1O1 – Revamping Hadoop

Big Data has witnessed a tremendous movement and growth over the last couple of years. As per the top research agencies, Big Data has recently emerged as the most successful “launch pad”,   giving a way to the maximum number of start-up ever.  As the space evolves further, more and more organizations of varied sizes and...

by Moonesh Kachroo
Tag: Apache Spark
20-Jan-2015

Big Data

Usage of GroupBy and Join in Apache Spark

Using GroupBy and JOIN is often very challenging. Recently in one of the POCs of MEAN project, I used groupBy and join in apache spark. I had two datasets in hdfs, one for the sales and other for the product. Sales Datasets column : Sales Id, Version, Brand Name, Product Id, No of Item Purchased, Purchased Date Product...

by Mohit Garg
Tag: Apache Spark
15-Sep-2014