Back

Speaker "Gene Pang" Details Back

 

Topic

The Architecture Of Decoupling Compute And Storage With Open Source Alluxio

Abstract

As Spark, MapReduce, and many frameworks are being widely deployed at enterprise productions, an efficient, and flexible compute and storage architecture often becomes a hot topic for debate among both IT and LOB practitioners. Although there are good reasons to run compute in a traditional hyper-converge environment as a part of a data lake implementation, the decoupling of storage and computation becomes more and more popular, as O’Reilly points out in its recent 2017 trend post. For example, Alluxio, IBM, Huawei, EMC, Redhat teams joint together to examine real world application examples and provide joint solutions. In this presentation, we will share the decision factors & considerations, such as application workload pattern, data locality, cost of infrastructure, network bandwidth, cloud deployment, etc. Production best practices and solutions will be shared to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multi-tenant high-performance platform that addresses the real world business demand.

Profile

Gene Pang is one of PMCs and maintainers of the Alluxio open source project and a founding member at Alluxio, Inc. He graduated with a Ph.D. from the AMPLab at UC Berkeley, working on distributed database systems. Before starting at Berkeley, he worked at Google and has an M.S. from Stanford University, and B.S. from Cornell University.