Back

Speaker "Brad Pflum" Details Back

 

Topic

Evolution of data science and data engineering in Paas

Abstract

With an estimated 2/3rds of American online shoppers having interacted with Narvar’s post-purchase ecommerce platform, Narvar is assembling one of the largest and richest datasets in the world. Narvar's Data Platform combines order information in dozens of formats across hundreds of retailers with tracking information from hundreds of carriers and usage behavior of tens of millions of consumers across our own application suite. We present our approach to modern ETL and data warehousing with multiple formats utilizing ingestion templating, serialized Kafka streams, cloud storage, and big data query systems. Using a metadata layer, data on S3 with arbitrary schema can be utilized in RDBMS tables for analysis or converted to HDFS or Orc files and processed with Hive or Presto for analytics and data science.  All of this speed and flexibility has a tendency to run at cross-purposes to control, so we'll also discuss how we balance agility with data integrity requirements using an expansion of pub/sub and API patterns.

Profile

With an academic background in Operations Research and applied Bayesian modeling, Brad has served in a number of data science, engineering, and leadership roles. He currently leads an integrated data engineering and data science team - the "Ministry of Data" - at Narvar, building a big data processing platform and multiple machine learning systems to power revenue products. Prior he spent 3 years building Data Science at Slice Intelligence, and 9 years in the national security industry.