Back

Speaker "Paul Hargis" Details Back

 

Topic

Workshop : Real-time Streaming and Machine Learning on Hadoop: Storm and Spark

Abstract

Among the greatest advancements in the Hadoop platform have been the addition of real-time streaming architectures and machine learning capabilities. Storm provides a highly programmable distributed compute platform to respond to events in real-time and ingest huge data volumes into traditional Hadoop datastores like Hive and Hbase. Spark provides a new in-memory data engine for processing large data volumes and uses statistical modeling techniques like linear regression to perform machine learning at scale. We will run a real-time streaming demo, plus a hands-on machine learning example that showcases how to develop your first model.

Profile

Developing solutions for Big Data clients around the Hortonworks Data Platform (HDP). Solution components include Hadoop, MapReduce, Hive/Tez, Pig, Storm, Kafka, and Spark. Assist customers with selecting a use case and building pilot projects around these HDP components.