Back

Speaker "Reza Shiftehfar" Details Back

 

Topic

Creating an Effective Data Platform: A case study on Uber’s data journey

Abstract

Uber’s mission is to ignite opportunities by setting the world in motion. To fulfill this mission, Uber relies heavily on making data-driven decisions in every product area and we need to store and process an ever-increasing amount of data, in addition to providing faster, more reliable, and more-performant access. This talk will reflect on the challenges faced with scaling Uber’s Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing our hardware. We started our data platform to focus on data reliability, solved scalability and ease-of-use challenges throughout the way and are currently focusing on faster data as well as improved efficiency. In this talk, We’ll look into what technologies we were able to use from the open-source community (e.g. Hadoop, Spark, Hive, Presto, Kafka, Avro, and Vertica) and what solutions we had to build in-house (and open-source) to make this happen. You'll leave the talk with greater insight into how things work at Uber and will be inspired to re-envision your own data platform to make it more generic and flexible for future new requirements.
Who is this presentation for?
All executives, managers, infra/data architects, data engineers, software developers working at companies with expanding datasets who needs to build an infrastructure that should last them in future with extremely large data size, real-time latency and ML applications.
Prerequisite knowledge:
basic familiarity with data platforms and Big data concept
What you'll learn?
- The audience will learn how to build a modern Big Data platform that expands beyond 100+ PetaBytes of data while providing real-time access - The audience will understand the internal design and architectural limitations of many popular existing open-source Big Data solutions (i.e. Hadoop, Spark, Hive, Presto, Kafka, Avro, Parquet) and how to overcome them to scale their data platform - The audience will learn about the internals of some of the open-sourced technologies from Uber (i.e. Hudi and Marmaray) and how they fit in the existing open-source Big Data ecosystem to help push the boundaries on speed and scale of traditional Big Data platforms.

Profile

Reza Shiftehfar currently leads Uber’s Hadoop Platform teams. His teams help build and grow Uber’s reliable and scalable Big Data platform that serves petabytes of data utilizing technologies such as Apache Hadoop, Apache Hive, Apache Kafka, Apache Spark, and Presto. Reza is one of the founding engineers of Uber’s Data team and helped scale Uber's data platform from a few terabytes to over 100 petabytes while reducing the data latency from 24+ hours to minutes. Reza holds a Ph.D. degree in Computer Science from the University of Illinois, Urbana-Champaign with focus on building Mobile Hybrid Cloud applications.