Back

Speaker "Yi Pan" Details Back

 

Topic

Unified processing with the Samza High-level API

Abstract

There are more and more applications that need to process both batch and stream data set in a unified programming API with flexible deployment model. The newly released version (0.13) of Apache Samza improves the simplicity and portability of Samza applications. The new high-level API supports common operations like windowing, map and join on streams. Developers can now express application logic concisely in few lines of code and accomplish what previously used to require several jobs. The other exciting Samza 0.13.0 feature is flexible deployment. It empowers developers to deploy and scale Samza applications as a simple embedded library, which is much more flexible than the original YARN deployment model. This talk will cover the new high-level API and flexible deployment as well as batch processing, both in terms of what is available and what is coming in the future.

Profile

Yi Pan has worked in the distributed platforms for Internet applications for 9 years. He started in Yahoo! on NoSQL database project, leading the development of multiple features, such as real-time notification of database updates, secondary index, and live-migration from legacy systems to NoSQL database. He joined and led the distributed Cloud Messaging System project later, which is used heavily as a pub-sub and transaction logs for distributed databases in Yahoo!. From 2014, he joined LinkedIn and has quickly become the lead of Samza team in LinkedIn and a Committer and PMC Chair in Apache Samza.