Speaker "Jon Morra" Details Back
-
Name
Jon Morra
-
Company
ZEFR
-
Designation
VP Data Science
Topic
Cluster YouTube: A Top Down and Bottom Up Approach
Abstract
At ZEFR we know that when an advertisement on YouTube is relevant to the content a user is watching it is a better experience for both the user and the advertiser. In order to facilitate this experience we discover billions of videos on YouTube and cluster them into concepts that advertisers and brands want to buy to align with their particular creatives. To serve our clients we use two different clustering strategies, a top down supervised learning approach and a bottom up unsupervised learning approach. The top down approach involves using human annotated data and a very fast and robust machine learning model deployment system that solves problems with model drift. Our clients are also interested in discovering topics on YouTube. To serve this need we use unsupervised clustering of videos to surface clusters that are relevant. This type of clustering allows ZEFR to highlight what users are currently interested in. We show how using Latent Dirichlet Allocation can help to solve this problem. Along the way we will show some of the tricks that produce an accurate unsupervised learning system. This talk will touch on some common machine learning engines including Keras, TensorFlow, and Vowpal Wabbit. We will also introduce our open source Scala DSL for model representation, Aloha. We show how Aloha solves a key problem in a typical data scientist's workflow, namely ensuring that feature functions make it from the data scientist's machine to production with zero changes.