Speaker "Aditya Kalro" Details Back



Machine Learning at Scale


Efficient use of large-scale data for Machine Learning (ML) research is a challenge. Training and distributing hundreds of models, monitoring performances, and sharing algorithms in a production environment requires tools to simplify the daily tasks of ML engineers. Facebook has developed a family of tools to manage the entire process of training, testing, and deploying ML models. Those include FBLearner Flow and Predictor. The former is a pipeline management system that facilitates experimentation, training, and comparison of models; the latter is an inference framework that uses the models to provide real-time inferences in production. FBLearner Flow is used by more than a thousand engineers per month. In a month, FBLearner is used to train more than 600,000 models, ingesting 2.3 billion data entries per model. These models are then used in production, serving more than six million predictions per second and touching all the major functionality of Facebook, including ranking News Feed stories and matching users to ads.


Engineering Manger on the CoreML Infra team at Facebook building FB's internal cloud for Machine Learning training. FBLearner is used by hundreds of engineers and ML practitioners to build ML training pipelines. Currently we train tens of thousands of models every week for more more than fifty teams in Facebook including Feed, Ads, Instagram and Search.