Speaker "Adem Efe Gencer" Details Back



Cruise Control: Effortless management of Kafka clusters


With the widespread adoption from the community, Kafka has become the de facto data streaming platform in the industry. A critical task induced by the increasing adoption involves addressing the challenges in managing this system. In particular, the growing scale made manually identifying, tracking, and mitigating issues with unhealthy cluster components and logical entities infeasible. Moreover, the imbalance of resource utilization across brokers led to unpredictable client performance due to high variation in the throughput and latency they observe. Finally, expanding, shrinking, or upgrading clusters also incurred a significant management overhead. Hence, it became clear that adopting a principled approach to manage Kafka clusters is integral to the sustainability of our infrastructure. This talk will describe how LinkedIn alleviates the management overhead of large-scale Kafka clusters using Cruise Control. To this end, first, we will discuss the reactive and proactive techniques that Cruise Control uses to support admin operations for cluster maintenance, enable anomaly detection with self-healing, and provide real-time monitoring for Kafka clusters. Next, we will examine how Cruise Control performs in live production deployments. Finally, we will conclude with questions and further discussion.
Who is this presentation for?
Kafka users, distributed systems developers, reliability engineers, and researchers interested in scalability, performance, reliability, and fault tolerance issues
Prerequisite knowledge:
A basic understanding of distributed system concepts (partitioning, replication, rack awareness, etc.)
What you'll learn?
Learn how Cruise Control achieves automated management of large-scale Kafka clusters to provide reactive and proactive mitigation via anomaly detection with self-healing, dynamic load balancing on heterogeneous clusters, and admin operations for cluster maintenance


Adem Efe Gencer develops Apache Kafka and the ecosystem around it, and supports their operation at LinkedIn. In particular, he works on the design, development, and maintenance of Cruise Control, a system for alleviating the management overhead of large-scale Kafka clusters. He holds a PhD in computer science from Cornell University, where his research has focused on improving the scalability of blockchain technologies.