Back

Speaker "Maximo Gurmendez" Details Back

 

Topic

Productizing Machine Learning over Big Data with AWS tools. 

Abstract

In this presentation we will focus on the problem of serving machine learning models that are trained over very large data sets (terabytes and above). In particular, we will show how some AWS tools, including Apache Spark on EMR and SageMaker, can aid such process. We will use notebooks to illustrate the ideas with real business use cases and we will share some success stories behind the development of smart data products along with the lessons learned. Throughout the presentation we will try to address some of these questions: 
 
·  What do we do when our training takes too long, or is too expensive?
 
·  Are "deployable notebooks" a good idea?
 
·  How can we integrate big data tools such as EMR/Spark with ML services such as SageMaker?
 
·  Why are model serving endpoints not enough?
 

Profile

Maximo holds a Masters degree in Computer Science / Artificial Intelligence from Northeastern University where he attended as a Fulbright Scholar. Since 2009 he has been working with DataXu as a lead engineer, tackling the challenge of machine learning over large large data sets. He’s also the founder of MDATALABS (data science & engineering consultancy) and a Big Data Science professor at the School of Engineering, University of Montevideo.