Back

Speaker "Rajasekhar Konda" Details Back

 

Topic

LakeHouse: Smart Iceberg Table Optimizer

Abstract

The new Lake House architectural design pattern provides many technical benefits like i) ACID support ii) Time travel for machine learning and iii) Better query performance. Apache Iceberg implements this pattern and provides the flexibility to further enhance based on real world needs. Using Apache Iceberg table format requires special vacuuming like i) snapshot expire and orphan removal for data governance, ii) metadata and data compaction for efficient and fast access of data. In this talk, we will discuss how to handle these table operations at very large scale by keeping cost in mind without compromising on data engineering, ML, analytical and BI use cases. Automating these operations makes life easier for engineers leveraging the platforms without worrying about how Iceberg internals work. We will share our lessons learned for optimizing the streaming and batch data sets in cost effective and efficient way.

Who is this presentation for?
Software engineers, Engineering Leaders, Data Engineerings, Data Lake and LakeHouse engineers, AI/ML Infra engineers

Prerequisite knowledge:
Distributed Systems : Apache Iceberg, Spark, Parquet

 

Profile

Raj works for Apple as an Engineering Manager for Data Platform and ML Infra focused on providing LakeHouse solutions for Machine Learning and Data Engineering use cases. Raj has over 18 years of experience in data space. One of the founding engineering members responsible for building large scale, multi-cloud and secure data/ML infrastructure at Apple. The platform runs at PB scale, over a trillion events ingestion daily. Adopters of open source technologies and responsible for designing, implementing and automating the managed services to deploy in various accounts.