Real-Time-Analytics with Spark and Cassandra

Written by Thomas Mann

Real-Time-Analytics mit Spark und Cassandra

Tuesday, 04 August 2015 00:00

Posted by WidasConcepts at the 6th Open-Source Business Intelligence Workshop of the Offenburg Hochschule
Abstract
Map & Reduce is history. Nowadays an up to 100 times faster performance in batch-processing can be achieved with Spark when compared with the old map and reduce approach.

The basis for this are the so-called “Resilient Distributed Datasets”(RDD) and the consequent use of in-memory approaches. But what brings about a faster “Processing” when, on the other side at the persistence and the further loading of the data, the comparatively slow HDFS is still resorted to.

A database that is real-time oriented, distributed, scalable, in-memory capable as well as analytic and compatible with Spark is: Cassandra.  The basic concepts of spark and Cassandra, as well as the integration between the two technologies will be presented. Followed also by a view on spark streaming, a stream-based approach, using which the real time analytics are possible. Speaker: Thomas Mann, team leader of the team, big data science at WidasConcepts.
My focus is the design and implementation of customer-specific big data solutions. I was pleased with the high number of participants and the stimulating discussion about the potential of real time analytics in diverse industrial-branches.