Back

Speaker "Andrew Musselman" Details Back

 

Topic

Data Engineering and Machine Learning with Apache Solr and Spark

Abstract

The modern-day search engine has evolved significantly from its keyword-matching days to its current form which leverages a wide variety of data inputs and user feedback loops to help users find out what’s most important in their data. Combining the power of a search engine (in our case Apache Solr) with the power of a fast distributed compute engine like Apache Spark can yield significant efficiencies in data engineering and faster results from machine learning applications. In this demo-oriented session, we’ll cover some common use cases, a bit of background on how they work (theoretical and practical) and mix in a good dose of examples for attendees to try out at home.

Profile

Andrew Musselman is Senior Director of Data Science for Lucidworks, a member of the Apache Mahout Project Management Committee, and host of the Adversarial Learning podcast. He loves distributed matrix math and lives in Seattle with his wife and kids.