We at IntelliGrape divide Big Data into four major sectors – as we commonly refer as 4C’s of Big Data.
These 4C’s are:-
- Capture (Data Ingestion)
- Contain (Data Persistence (NoSQL)
- Compute (Data Processing)
- Comprehend (Data Analytics and Visualization)
Within this blog, I’ll be focusing on the last pointer i.e. Comprehend part of Big Data – precisely the Visualization.
The below image showcases the 4Cs with the technology stack that we use across the projects.
Visualization can be considered as the face to your Big Data and no one can undermine the importance of the face value. More usable, intuitive and customized interface to your numbers, more value it can potentially draw for any kind of business use case, irrespective of the nature of data.
We have a diverse exposure in Visualization with industry leading technologies like Tableau. But in this blog, I’ll be talking about an small open source view APIs that we came across recently while working in a project.
We were supposed to produce a view of a Time Series representational over multiple factors. The view was also supposed to dynamically support the axis represenation based on whatever data type you operate. Apart from this, multiple variants could be applied on the same view. There are various tools that can achieve this but specifically here I would like to mention about gvis – which is a great charting APIs when you want to show something varying overtime in a form of animated view. Though, animations are generally not supposed to be a good choice for data representation – but that’s a subjective discussion.
Scenario: We had to talk to HDFS through R-HDFS connector, displaying time series data based on extreme dynamism. So, this problem will tackled using gvisMotionChart API, which I’ll be explaining below.
We wanted to draw motion graph on web-interface using R with the help of shiny, so here I have used gvisMotionChart for drawing the motion graph in R.
gvisMotionChart provides the different types of parameters like changing the axis dynamically, provides the different type of filters, changes the graph from motion to line.
Here is datasets of the sale SalesData.
Our datasets contains sales, state, profit, quantity, time. As per the use case, we wanted to show a visualization using all these parameters dynamically, overtime and also represent the output view in different variants of graph with in same component.
- R: Statistical Analysis Tool
- Shiny: Used as a web interface for R
- R-HDFS: Connector for R to HDFS
- HDFS: Where our datasets resides
- gvisMotionChart: Plugin to R
Here is the step-by-step method for creating a chart using gvisMotionChart API.
So in our use case, we need to push the csv file in HDFS,
Here location is /user/data/state_aggregated_data/state.csv.
For gvisMotionChart, we are using following sniplets in R to show the motion graph.
hdfs.init(); brandGeoTrancs = hdfs.file("/user/data/state_aggregated_data/state.csv","r",buffersize=104857600); brandGeoRead = hdfs.read(brandGeoTrancs); brandGeoChar = rawToChar(brandGeoRead); stateAggregatedData = read.table(textConnection(brandGeoChar), sep = "\t"); stateAggregatedData$Id<-stateAggregatedData$State stateAggregatedData$Profit <- as.numeric(stateAggregatedData$Profit) stateAggregatedData$Sale <- as.numeric(stateAggregatedData$Sale) stateAggregatedData$Year <- as.integer(stateAggregatedData$Year) stateAggregatedData$OrderQuantity <- as.numeric(stateAggregatedData$OrderQuantity) gvisMotionChart(stateAggregatedData,idvar="Id",timevar="Year",yvar="OrderQuantity",xvar="Sale",options=list(height=750, width=1500))
In above mentioned code, we are making the connection with HDFS, pulling data from it, converting numbers into numeric and showing the motion graph.
This is the main method for showing the gvisMotionChart.
Here idVar and timeVar both combination should be unique for the datasets.
timeVar - shows the year in slider
For whole source code: GitHub Location
Hope this blog helps you !!!