Back

Speaker "Mainak Ghosh" Details Back

 

Topic

Druid at Twitter: How Big Data Goes Real Time

Abstract

An important characteristic of Twitter is its real-time nature. Consequently, many of Twitter’s projects need real time analytics as a platform service. During recent years, a number of teams are adopting Druid as real time analytics engine. In this talk, we will talk about Druid at Twitter, which provides sub-second query performance, real time ingestion, and simple slice and dice UI. We start with Twitter’s big data architecture, followed by a detail introduction of Apache Druid. We will focus on a number of Druid features that Twitter will use (or considering contributing to), including Native Indexing support for Hadoop data, LDAP authentication and authorization, data scrubbing, and the Presto Druid Connector, which provides complex SQL functionality on Druid data, and enables joining data between Hadoop, Druid, Cassandra, Elasticsearch, and any data storage solutions, without data copy.
 

Profile

I am currently working in the Druid team @ Twitter. In the past I have worked on other interactive analytics engines like Presto and Hive. Before Twitter, I completed my PhD working in the DPRG group with Prof Indranil Gupta. My thesis proposes efficient techniques for sharding, compaction, replication and prefetching. I have also worked for the InDesign team at Adobe for 3 years before doing a PhD.