Back

Speaker "Sameer Wadkar" Details Back

 

Topic

Data Streaming and Machine Learning Ops for Big Data and AI

Abstract

Streaming solutions for Feature Engineering and Machine Learning are incredibly complex due to the diversity of technologies used for data engineering and ML engineering. Large windows (time windows or custom windows) need complex Java based windowing mechanisms while ML/Feature Engineering occurs typically in Python. Translating code from one language is error prone and results in large scale delays of model deployment. We present an open solution that supports end to end streaming machine learning covering all steps from model inception to model deployment while serving as a system of record for large scale, high velocity and high volume ML training and inference. The solution can horizontally scale to millions of data points. It supports distributed training and streaming inference which scales horizontally and is resilient to failures as a streaming solution should be. The solution is inclusive, not prescriptive and supports best of breed technologies for the problem at hand. Other features supported are end-to-end provenance from iterative training to inference as your model is developed, deployed, measured and the cycle repeated with no human intervention unless alerted due to anomalies. This allows our solution to support continual refinement, versioned streaming models to produce repeatable and auditable inferences for any model version. We recognize that in the world where we have grown to expect near real time results, we need batched training, online training and near real time inference to make a differentiable business impact. We support a solution that is unified for batch and streaming data. You code only once regardless of whether you are operating on “Data at Rest” or “Data in Motion"
Who is this presentation for?
Data Architects, MLOps/DevOps Engineers and Data Scientists
Prerequisite knowledge:
Prior experience/knowledge of building/deploying Feature Engineering Pipelines and Models.
What you'll learn?
The challenges of training, deploying and managing Models and Feature Engineering Pipelines which consume data streams at high velocity.

Profile

Sameer Wadkar is a Staff Field Engineer at Domino Data Lab helping where he helps customers through all phases of ML Development and Deployment on the Domino DataLab Platform. Previously he has built a Streaming MLOps platform which enables operationalizing of Machine Learning models to enable rapid turnaround times from model development to model deployment. This platform would support data ingestion from data-lakes, streaming data-transformations and model deployment in hybrid environments ranging from on-premise, cloud as well as edge devices. He has also developed Big Data systems capable of handling billions of financial transactions per day arriving out of order for market reconstruction to conduct surveillance of trading activity across multiple markets. He has implemented Natural Language Processing (NLP) and Computer Vision based systems. He is the author of the book "Pro Apache Hadoop"