
Speaker "Kai Liu" Details Back

-
Name
Kai Liu
-
Company
Microsoft
-
Designation
Sr. Program Manager
Topic
FrameworkLauncher: An Orchestrator for Large-Scale Distributed Deep Learning in Bing and Microsoft
Abstract
Among Bing product teams and Microsoft Research, many teams are doing advanced large-large scale distributed deep learning. We have met various challenges in the flow, such as fragmented experience in training and serving, orchestration of complex workflows, and effective use of GPU capacity. None of the existing solutions could address the challenges at our scale, so we initiated FrameworkLauncher project. It provides unified workflow authoring experience across training and serving, support a rich set of rules for orchestration, and sophisticated in GPU capacity schedule. It is now used in Bing production environment for deep-learning powered scenarios. It is also open sourced and available to public through project OpenAI at https://github.com/Microsoft/pai.