Back

Speaker "Hongyue Zhang" Details Back

 

Topic

LakeHouse: Smart Iceberg Table Optimizer

Abstract

The new Lake House architectural design pattern provides many technical benefits like i) ACID support ii) Time travel for machine learning and iii) Better query performance. Apache Iceberg implements this pattern and provides the flexibility to further enhance based on real world needs. Using Apache Iceberg table format requires special vacuuming like i) snapshot expire and orphan removal for data governance, ii) metadata and data compaction for efficient and fast access of data. In this talk, we will discuss how to handle these table operations at very large scale by keeping cost in mind without compromising on data engineering, ML, analytical and BI use cases. Automating these operations makes life easier for engineers leveraging the platforms without worrying about how Iceberg internals work. We will share our lessons learned for optimizing the streaming and batch data sets in cost effective and efficient way.
Who is this presentation for?
Software engineers, Engineering Leaders, Data Engineerings, Data Lake and LakeHouse engineers, AI/ML Infra engineers
Prerequisite knowledge:
Apache Iceberg
What you'll learn?

Profile

Hongyue started his career developing micro services in AWS to help schedule the serverless containers at scale. His journey with data started in 2021, and he immediately became the fan of the Apache Iceberg project. At Apple, Hongyue is building tools and systems around Apache Iceberg to help make data-driven decisions.