Back

Speaker "Thomas Phelan" Details Back

 

Topic

Virtualized HDFS

Abstract

What does it mean to virtualize the Hadoop distributed file system? This session will delve into the multiple different meanings of "virtualized HDFS." It will lead an investigation into the abstraction of the HDFS protocol in order to permit any storage device to deliver data to a Hadoop application in a performance critical environment. It will include a discussion and assessment of the work in this area done by projects such as Tachyon and MemHDFS.

There are at least two meanings of the phrase “virtualized HDFS.” One is the creation of an HDFS file system within a cluster of virtual machines; the second is the abstraction of the HDFS protocol in order to implement a “virtual” HDFS file system and permit any storage device to provide data to Hadoop applications. This session will investigate both of these meanings of virtualized HDFS. It will draw from experiences with multiple projects (including Apache Tachyon, MemHDFS, CEPH object store, and others) to describe existing implementations of the first and propose a high-speed implementation of the second.

Profile

Tom Phelan earned his computer science degree from UC Berkeley and then began a long career focused on storage and systems virtualization. After cutting his teeth on UNIX internals at Altos Computer Systems he went on to develop highly fault tolerant storage subsystems at Stratus. At Silicon Graphics he was a member of the team that designed and developed the XFS file system. Furthermore, he was the architect and primary developer of the GRIO, Guaranteed Rate I/O, sub system of XFS which he used later to optimize storage performance for the delivery of streaming data. Tom was an early employee at VMware and as senior staff engineer was a key member of the ESX storage architecture team. During his 10 year stint at VMware he designed and developed the ESX storage I/O load balancing subsystem and the modular “pluggable storage architecture.” He went on to lead teams working on many key storage initiatives such as the cloud storage gateway and vFlash. He resigned from VMware in 2012 to join Kumar Sreekanti as co-founder and chief architect of BlueData.