Back

Speaker "Zhenxiao Luo" Details Back

 

Topic

Druid at Twitter: How Big Data Goes Real Time

Abstract

An important characteristic of Twitter is its real-time nature. Consequently, many of Twitter’s projects need real time analytics as a platform service. During recent years, a number of teams are adopting Druid as real time analytics engine.
In this talk, we will talk about Druid at Twitter, which provides sub-second query performance, real time ingestion, and easy of use for user. We start with Twitter’s big data architecture, followed by a detail introduction of Apache Druid. We will focus on a number of Druid features developed by Twitter, including Native Indexing support for Hadoop data, LDAP authentication and authorization, data scrubbing, and the Presto Druid Connector, which provides complex SQL functionality on Druid data, and enables joining data between Hadoop, Druid, Cassandra, Elasticsearch, and any data storage solutions, without data copy. Production experiences will be shared.

Profile

Zhenxiao Luo is Sr. Staff Engineer, leading Interactive Query Engines team at Twitter, where he focuses on Druid, Presto, Spark, and Hive. Before joining Twitter, Zhenxiao was running Interactive Analytics team at Uber. He has big data experience at Netflix, Facebook, Cloudera, and Vertica. Zhenxiao is PrestoDB committer. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.