Back

Speaker "Slim Bouguerra" Details Back

 

Topic

Interactive Analytics with Apache Hive and Druid

Abstract

Traditional rigid OLAP solutions can not handle Big Data Business Intelligence (BI) any more due to exponential growth in data size, types hierarchical complexities. In an attempt to overcome and derive benefits from Hadoop platform to the business we propose a hybrid solution that takes advantage from the combination of a fast columnar storage Druid and traditional big data SQL database Apache Hive. Druid is great for sub seconds analytics because it combines the best qualities of a column store, inverted indexing and bitmap indexes, which minimizes I/O costs and enables fast filter pruning for analytical queries. Although Druid has serious limitations most important of these are joins, SQL support and traditional databases transactional consistency model. This is a major architecture breakthrough as oppose to traditional systems like Impala or SparkSQL which rely on columnar storage to provide high-throughput aggregation, but do not deal well with finding the “needles in the haystack” Integrating Druid with Hive allowed us to overcome big data BI challenges and achieve impressive speedups of factors ranging from X10 to X100 by offloading some of the high volume drill down slice and dice workload to Druid. In this talk will present and analyze the performances of the proposed architecture side to side with existing solutions through the lens of concrete use cases and TPCH star schema benchmark.

Profile

Slim is Sr Software Engineer and Druid Committer, recently joined HortonWorks druid team after spending couple of years working at Yahoo inc as part of the open source Druid team. He holds a PhD in computer science from Grenoble University in France.