Speaker "Ilya Ganelin" Details Back



Deconstructing Recommendations on Spark


This talk focuses on the practical details of building a recommendation engine on top of Spark’s ML Lib ALS collaborative-filtering algorithm that can reliably generate predictions for 25 million users from a space of 5 million products. The unique aspect of this work is two-fold. First, we are able to generate scores for every combination of user and product (125 trillion possible values) on a small 6-node cluster. Secondly, clever optimization provides several orders of magnitude improvement over ML Lib’s predictive step with linear performance scaling as more cores are added to the system. The primary goal is to present the optimizations and parameter tuning necessary to achieve these gains coupled with a discussion of the Spark internals that come into play. The talk will be tailored for the intermediate Spark developer who wishes to understand the trickier aspects of Spark and how these affect both stability and performance.


Ilya Ganelin is an electrical engineer turned data scientist. Research in robotics brought him to the University of Michigan and fascinating research into making robots that can learn and develop the way that human children do. This venture into developmental robotics got him interested in machine learning and after a few years doing DSP work with cellular phones and software defined radios he jumped into the wide world of big data. His present focus is on distributed computing and the architectures that support it looking to change the world of banking.