Global Big Data Conference

Industry News Details

Interview with Speaker Paul Hargis, Solutions Engineer, Hortonworks at Dallas Big Data Bootcamp May 13-15 2016 Posted on : Apr 26 - 2016

We feature speakers at Big Data Bootcamp Dallas May 13-15 2016 to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Paul Hargis, Solutions Engineer, Hortonworks.

Interview with Paul Hargis

Tell us about yourself and your background.

Paul Answer: My background starts with software development. I have always loved building tools and products with software. Over time, I graduated to architectural roles, such as leading development of a file sync engine at QuickOffice, and followed that at Google by writing code for the Google Drive client. In the last 2 years, I switched to Hadoop and Big Data, working at Hortonworks as a Sr. Solutions Engineer. I am a Subject Matter Expert for Spark, one of the great new engines being used on Hadoop to deliver value using Machine Learning.

What have you been working on recently?

Paul Answer: Recently, I have been focused on using Spark for Machine Learning, a huge new area of productivity. The Spark interface is really versatile, allowing for one to use Scala, Python, and SQL commands all in the same engine. Plus, Spark gives you lots of ways to work with your data, including DataFrames, Datasets, plus raw RDDs. Data Science notebooks are also very useful, especially for development and testing; I use Zeppelin, Jupyter, and Databricks Cloud.

What has your experience been with Spark & Hadoop?

Paul Answer: I have been working with the entire Hadoop stack for almost 2 years. I started out focused on real-time streaming engines like Storm, and message bus services like Kafka. I come from a real-time programming background, having written code for military flight simulators used for man-in-the-loop training of fighter pilots and radar operators. So it was natural to gravitate towards the real-time engines -- both Storm and Spark have Streaming capabilities. I picked up Spark next and really became interested in the math and statistics behind the machine learning algorithms.

Why do you think Architects, DBA, Admins, Managers, Executives & Data Scientists should attend Big Data Bootcamp Dallas May 13-15 2016?

Paul Answer: This is one of the best ways to accelerate your learning of Big Data technologies in a practical hands-on environment. It allows participants to move beyond the blog posts and achieve meaningful understanding of the technologies.

What are some of the best takeaways that the attendees can have from your Machine Learning with Spark workshop?

Paul Answer: Attendees will gain a hands-on ability to use Spark in an interactive manner to solve real-world problems. We will walk through a supervised learning example, delving into the details of each step.

What are the top 5 big data implementation mistakes to avoid?

Paul Answer:

Not getting started! (this is really the only mistake you cannot recover from)
Trying to learn everything at once -- pick one area like real-time streaming, or machine learning
Getting side-tracked by hype -- there is always something new on the horizon
Getting bogged down by the platform -- choose a toolset with a sandbox environment that enables you to run the data engine against real (small scale) data
Doubting your own capabilities -- start small, iterate, and gain confidence as you go; read blogs in order to discover best practices, and look for online examples.

What trends you see in upcoming 6 months?

Paul Answer: Look to firms like Gartner and Forrester for industry trends. Realize that nobody has a crystal ball, and much of what is “trending” on technical blogs is simply marketing in disguise. People will always have agendas, which is fine, but bloggers should not mislead people. Machine Learning, Deep Learning, Neural Networks -- these are the hot topics right now.

Huge shortage of Big Data Architect's in the market. Who should take up Big Data Architect role? How to become rockstar Big Data Architect?

Paul Answer: Yes, there is a shortage of Big Data Engineers and Data Scientists, but also a shortage of leadership. Without the firm support of company executives, the current big data initiatives will never graduate from the “science project” status. As for as who, that’s easy -- anyone with the passion to become a data scientist, who puts in the time to learn the skills can become one. I have heard many people start out by saying “I am not a data scientist, but…”. The industry has hyped this role so much that people are afraid to put on the mantle, which is counter-productive to everyone. Leverage your local meetup groups that focus on Data Science, like “DFW Data Science” (I’m a co-organizer and event host). We can all learn from each other. Data Science is really a team effort, so learn to work with other experts in the field, and share knowledge when you can.

Any closing remarks?

Paul Answer: Big Data has moved beyond hype to the reality stage (the so-called “late majority” of adoption). Companies of all sizes are gaining value from the ever-increasing data available at their fingertips. It is our job as architects to bring those tools into the market, train people how to use them, and develop the methodologies that can transform businesses. Not only are companies on a journey, we as individuals are on our own journey, whereby we no longer look at the world in the same way. It is an exciting time to be an Information Architect with all the new tools that help us gain insights as never before. Let’s get started!!!

Get the