Back

Speaker "Carolyn Duby" Details Back

 

Topic

Data Science at Scale with Apache Spark 2.0 and Zeppelin Notebook

Abstract

Has your data science problem outgrown your desktop? More data and more varied data often helps you understand your problem better and produce more accurate prediction algorithms. Testing more algorithms can often help you find the best model or ensemble. However more data and more algorithms will quickly overwhelm the processing power of even the most powerful desktop. This workshop demonstrates how to clean, analyze, explore and predict with large data sets on a horizontally scalable cluster of computers using Apache Spark. Record the Spark code for the entire pipeline with Apache Zeppelin web-based notebook. Finally learn how to share your analysis and results with others using the cloud.

Profile

Carolyn Duby is a Solutions Engineer at Hortonworks where she helps customers harness the power of their data with Apache open source platforms. Prior to joining Hortonworks she was the architect for cyber security event correlation at SecureWorks. Ms. Duby earned a ScB Magna Cum Laude and ScM from Brown University in Computer Science. She recently completed the Johns Hopkins University Coursera Data Science Specialization. With a diverse experience working for small companies, startup companies, large companies, and for herself, she has a passion for challenging data intensive systems. For fun, she enjoys cooking, singing, horseback riding, and fitness.