Speaker "Brad Pflum" Details Back
-
Name
Brad Pflum
-
Company
Narvar
-
Designation
Manager
Topic
Evolution of data science and data engineering in Paas
Abstract
With an estimated 2/3rds of American online shoppers having interacted with Narvar’s post-purchase ecommerce platform, Narvar is assembling one of the largest and richest datasets in the world. Narvar's Data Platform combines order information in dozens of formats across hundreds of retailers with tracking information from hundreds of carriers and usage behavior of tens of millions of consumers across our own application suite. We present our approach to modern ETL and data warehousing with multiple formats utilizing ingestion templating, serialized Kafka streams, cloud storage, and big data query systems. Using a metadata layer, data on S3 with arbitrary schema can be utilized in RDBMS tables for analysis or converted to HDFS or Orc files and processed with Hive or Presto for analytics and data science. All of this speed and flexibility has a tendency to run at cross-purposes to control, so we'll also discuss how we balance agility with data integrity requirements using an expansion of pub/sub and API patterns.