Back

Speaker "Einat Orr" Details Back

 

Topic

Level Up Your Data Lake - to ML, AI and Beyond

Abstract

A data lake is primarily two things: an object store and the objects being stored. Even with the most basic setup, data lakes are capable of supporting BI, Machine Learning, and operational analytics use cases. This flexibility speaks to the strength of object stores, particularly their flexibility in integrating with a diverse set of data processing engines. As data lakes exploded in adoption, a number of improvements were made to the first architectures. The first and most obvious improvement was to file formats, which led to the development of analytics-optimized formats like parquet, and eventually modern table formats. An even newer improvement has been the emergence of data source control tools that bring new levels of manageability across an entire lake! In this talk, we'll cover how to incorporate these technologies into your data lake, and how they simplify workflows critical to ML experimentation, deployment of datasets, and more!
Who is this presentation for?
Data Engineers ML / AI
Prerequisite knowledge:
Data science tools and concepts
What you'll learn?
How open source tooling can simplify many data engineering pains

Profile

Einat Orr has 20+ years of experience building R&D organizations and leading the technology vision at multiple companies, the latest being Similarweb, that IPO in NYSE last May. Currently she serves as Co-founder and CEO of Treeverse, the company behind lakeFS, an open source platform that delivers a git-like experience to object-storage based data lakes. She received her PhD. in Mathematics from Tel Aviv University, in the field of optimization in graph theory.