Back

Speaker "Aditya Kalro" Details Back

 

Topic

Machine Learning at Scale

Abstract

Efficient use of large-scale data for Machine Learning (ML) research is a challenge. Training and distributing hundreds of models, monitoring performances, and sharing algorithms in a production environment requires tools to simplify the daily tasks of ML engineers. Facebook has developed a family of tools to manage the entire process of training, testing, and deploying ML models. Those include FBLearner Flow and Predictor. The former is a pipeline management system that facilitates experimentation, training, and comparison of models; the latter is an inference framework that uses the models to provide real-time inferences in production. FBLearner Flow is used by more than a thousand engineers per month. In a month, FBLearner is used to train more than 600,000 models, ingesting 2.3 billion data entries per model. These models are then used in production, serving more than six million predictions per second and touching all the major functionality of Facebook, including ranking News Feed stories and matching users to ads.

Profile

Engineering Manger on the CoreML Infra team at Facebook building FB's internal cloud for Machine Learning training. FBLearner is used by hundreds of engineers and ML practitioners to build ML training pipelines. Currently we train tens of thousands of models every week for more more than fifty teams in Facebook including Feed, Ads, Instagram and Search.