
Speaker "Thomas Phelan" Details Back

-
Name
Thomas Phelan
-
Company
Bluedata Inc.
-
Designation
Chief Technology Officer
Topic
Virtualized HDFS
Abstract
What does it mean to virtualize the Hadoop distributed file system? This session will delve into the multiple different meanings of "virtualized HDFS." It will lead an investigation into the abstraction of the HDFS protocol in order to permit any storage device to deliver data to a Hadoop application in a performance critical environment. It will include a discussion and assessment of the work in this area done by projects such as Tachyon and MemHDFS.
There are at least two meanings of the phrase “virtualized HDFS.” One is the creation of an HDFS file system within a cluster of virtual machines; the second is the abstraction of the HDFS protocol in order to implement a “virtual” HDFS file system and permit any storage device to provide data to Hadoop applications. This session will investigate both of these meanings of virtualized HDFS. It will draw from experiences with multiple projects (including Apache Tachyon, MemHDFS, CEPH object store, and others) to describe existing implementations of the first and propose a high-speed implementation of the second.