Speaker "Claudiu Barbura" Details Back



Overkill Analytics on High Dimensional Feature Spaces


In our quest for data science automation we have learned many lessons that I am going to share in this session.

Less slides and more demos featuring real world use cases such as predicting port destination for oil ships and the Outbrain Kaggle competition, all performed from our own notebook (called DSL Workbench) we built for exploratory data analysis. DSL is the fluent and expressive API we created to expose data and services from our data science platform.

I will compare multiple approaches for feature engineering, reduction as well as full feature space training employing OKA (OverKill Analytics) techniques: where could not perform on high dimensional sparse feature spaces we employed Spark for distributing scikit-learn, VW, TensorFlow and R packages and produced ensemble models and prediction tables that still yield highly accurate predictions.


I will cover and show concrete examples for geo-spatial, composite and progressive modeling, deep learning, high dimensional and sparse feature engineering, the primitives we built for handling sparse data beyond the support in Spark or scipy.

While I’ll focus on data science at scale I will also touch on infrastructure aspects, with tips and tricks we learned with the underlying technology stack: scala, python, Spark, HDFS, Cassandra, ElasticSearch, Zookeeper, VW, TensorFlow etc


Learned my trade in Romania, moved to the US in 2001, and have been adopted by this country with open arms ... I'm happy to live and work here. I have always had a huge passion for bleeding edge technology, for applying best patterns and practices and leading architecture efforts, but Big Data Analytics has changed my world. To this date it gives me the fuel that keeps me motivated and energetic to keep pushing the envelope in leveraging complex distributed systems and advanced analytics to create impact across so many domains. I love sharing the lessons learned with emerging technologies as a speaker at meetups and conferences, and I am deeply grateful to the open source community for its fantastic contribution ... I'm always ready to give back. After years of building big data platforms on the infrastructure and real-time services side, I was ready to tap into Machine Learning and Data Science at scale, and that's what I found at Ubix. When I'm not busy executing on a ground breaking product that will change the Advanced Analytics industry, I am chasing the wind and attempting (in vain so far!) to control my kiteboarding addiction and my thirst for breaking altitude records all the time! I will also ride wake/snow boards when the wind dies, and will regularly run my socks off on a soccer field twice a week. My wonderful wife is my best coach, she makes sure I stay healthy and happy.