
Speaker "Carolyn Duby" Details Back

-
Name
Carolyn Duby
-
Company
Hortonworks
-
Designation
Big Data Architect
Topic
Data Science at Scale with Apache Spark 2.0 and Zeppelin Notebook
Abstract
Has your data science problem outgrown your desktop? More data and more varied data often helps you understand your problem better and produce more accurate prediction algorithms. Testing more algorithms can often help you find the best model or ensemble. However more data and more algorithms will quickly overwhelm the processing power of even the most powerful desktop. This workshop demonstrates how to clean, analyze, explore and predict with large data sets on a horizontally scalable cluster of computers using Apache Spark. Record the Spark code for the entire pipeline with Apache Zeppelin web-based notebook. Finally learn how to share your analysis and results with others using the cloud.