Back

Speaker "Pavel Dmitriev" Details Back

 

Topic

A/B Testing AI

Abstract

I often interview candidates with machine learning backgrounds. After they describe how their model increased AUC / reduced loss / improved accuracy or some other such measure, I ask the question that gets many people confused: “How do you know it’s better for users?” Surprisingly few recognize that the standard practice of evaluating ML models offline based on the training and test sets has severe limitations and, in fact, can’t answer this question. In this talk, I will discuss the challenges with offline evaluation of ML models and provide examples. I will then introduce A/B testing and show how the above challenges can be addressed with A/B testing. For a summary, see https://www.linkedin.com/pulse/ab-testing-ai-pavel-dmitriev/. I will also discuss the common technical challenges of executing A/B tests on ML algorithms, such as infrastructure requirements, connecting online and offline metrics, and handling ramp up periods for online learning algorithms. Overall, the goal of this talk will be to motivate ML practitioners to use A/B testing when evaluating their algorithms and provide them with high-level guidelines on how to do it.


Who is this presentation for?
Data Scientists, AI/ML Executives


Prerequisite knowledge:
Basic understanding of statistics


What you'll learn?
Why offline evaluation of AI/ML models is insufficient, how A/B testing can help, and tips on how implement A/B testing for evaluating AI.

Profile

Pavel Dmitriev is a Vice President of Data Science at Outreach, where he works on enabling data driven decision making in sales through experimentation and machine learning. He was previously a Principal Data Scientist with Microsoft's Analysis and Experimentation team, where he worked on scaling experimentation in Bing, Skype, and Windows OS. Pavel co-authored numerous papers at top-tier data mining and machine learning conferences, such as WWW, ICSE, KDD, has given keynotes and tutorials at WWW, SIGIR, SEAA, and KDD.