Thursday 9 June 2016

Integrating Kafka, Spark Streaming and Hbase to power a real time dashboard



Here is a simple POC project which integrates Kafka, Spark Streaming, Hbase, NodeJs and D3 Js.
I got inspired by this idea from here where they explained how a real time streaming application with a UI dashboard is developed using Spark Streaming.

Below is the block diagram from sigmoid.



I developed a sample prototype project called Voting machine which integrates different components like Kafka, Spark Streaming, Hbase, NodeJs and D3 Js as above. The application will listen a kafka topic, aggregate the votes received from different parties and project a real time bar chart dashboard using D3 Js.

This is just a simple prototype which helps in understanding the integration of multiple components. The use case I'm trying to build here is a simple vote count aggregator which can be easily implemented using  java with JMS. I just wanted to try a demo. The actual use case should be doing some analytics on real time data feeds from Twitter, Facebook and other social medias or detecting the fraudulent credit transaction from the real time transaction feed data.

Below are the components of the application.

Kafka Topic :

A Kafka broker server has to be started and a topic has to be created to feed the real time votes. Here, I used the in built Kafka producer to feed the votes manually through the terminal.





Spark Streaming :

This is spark streaming component which will listen to the Kafka topic. A Spark DStream batch is created using the KafkaUtils.createStream() having a batch interval of 2 seconds. The spark program will aggregate the vote count of the current batch, push the aggregated count to a noSql db HBase.




Hbase :

HBase is a distributed column-oriented database built on top of the Hadoop file system. It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System. Here I used HBase Apis to fetch and update the vote count in hbase.

Global aggregation:
Along with processing the count of the current batch, it will aggregate the previous count in the hBase db.



NodeJs
NodeJs is server side Java script library which can used to configure a server very easily.
A simple nodeJs restApi server is configured inorder to get the current vote counts in Json format. express package in nodeJs is used to set up the route rule for /fetch url pattern.

Here I used HBase thrift Apis to fetch the vote count in hbase. We can also use Hbase Rest Apis to connect/fetch the data from Hbase. The only difference is that in hbase thrift api, the payload data is in byte stream and size of the payload will be less compared to the restApi payload.

In-order to connect to the Hbase thrift api, Hbase.js and thrift module has to be exported to the nodeJs project.

Rest Api : http://localhost:8081/fetch

sample json data:



D3 JS

A vote count barchart is developed using d3 Js library. The Json vote data is consumed from the nodeJS server and the barchart is rendered using the current vote data.




 

 The code is available in here in github.


2 comments: