
Speaker "John Canny" Details Back

-
Name
John Canny
-
Company
UC Berkeley And Yahoo Labs
-
Designation
Data Scientist
Topic
Large-Scale Machine Learning
Abstract
Machine Learning at the Limit John Canny UC Berkeley How fast can machine learning (ML) and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. "Codesign" pairs efficient algorithms with complementary hardware. These methods can lead to dramatic improvements in single node performance: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve one- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are typically larger than have been reported for *cluster* systems running on hundreds of nodes for common ML tasks. An open challenge is to exploit rooflined single nodes in clusters. The optimal communication rates are right at the network limits, and communication design is itself a rooflined design problem. We describe two solutions that are optimal respectively for small and large models. "Butterfly mixing" is an effcient, simple, and fault-tolerant approach to distributed ML with small models that are replicated on each node. "Kylix" is an optimal approach for large, sparse and possibly distributed models. We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank