Back

Speaker "Kai Liu" Details Back

 

Topic

FrameworkLauncher: An Orchestrator for Large-Scale Distributed Deep Learning in Bing and Microsoft

Abstract

Among Bing product teams and Microsoft Research, many teams are doing advanced large-large scale distributed deep learning. We have met various challenges in the flow, such as fragmented experience in training and serving, orchestration of complex workflows, and effective use of GPU capacity. None of the existing solutions could address the challenges at our scale, so we initiated FrameworkLauncher project. It provides unified workflow authoring experience across training and serving, support a rich set of rules for orchestration, and sophisticated in GPU capacity schedule. It is now used in Bing production environment for deep-learning powered scenarios. It is also open sourced and available to public through project OpenAI at https://github.com/Microsoft/pai.
 

Profile

Kai Liu is a Senior Program Manager in AI and Research group of Microsoft. He has 8 years of experience in data driven engineering, big data platform and AI infrastructure for Office and Bing product families. He led his team to create a service health portal for SharePoint Online, inject a distributed log collection and storage system for Exchange Online, publish curated data sets, key business metrics, and enable sub-hour experimentations in Office 365. Currently he is working on the next generation of Big Data and Deep Learning platform for Bing based on Open Source technologies.