Back

Speaker "Wangda Tan" Details Back

 

Topic

Hadoop ecosystem boosts Tensorflow and machine learning technologies

Abstract

TensorFlow™ is a popular open source software library for machine intelligence. While TF gives people abilities to describe the latest algorithm for machine learning and deep learning, it is also very important to make TF can be best fitted into the Hadoop ecosystem. In this session, we will talk about how Hadoop ecosystem components boosts TF and other machine learning technologies, including: - Using Hadoop YARN to manage large scale TF services running on a GPU-equipped cluster, and share the same cluster with other tenants and applications. - Using Spark/Hive for large scale data preprocessing. - Using Zeppelin as an interactive interface to orchestrate and visualize the learning workflow. At last, we will use a classic machine learning challenge - online advertising Click Through Rate (CTR) prediction as an example to show how TF works with YARN, Spark and Zeppelin to train a better model in an efficient way.

Profile

Wangda Tan is Product Management Committee (PMC) member of Apache Hadoop and Staff Software Engineer at Hortonworks. His major working field is Hadoop YARN resource scheduler, participated features like node labeling, resource preemption, container resizing etc. Before join Hortonworks, he was working at Pivotal, working on integration OpenMPI/GraphLab with Hadoop YARN. Before that, he was working at Alibaba, participated creating a large scale machine learning, matrix and statistics computation platform using Map-Reduce and MPI.