Speaker "Sushanth Sowmyan" Details Back



Enabling cross-datastore replication with Hive


Replication is an important feature needed in most mature database systems. It is used to enable disaster recovery as well as for cluster load balancing, and sometimes for cluster isolation for security reasons. We go over the mechanism being developed, wherein, rather than baking-in replication on to Hive itself, we have chosen instead, to add an event-based replication capability to Hive, so as to allow other tools such as Falcon to plug in to implement replication. This allows admin/user-facing tools like Falcon to have fine control on what and how they replicate as defined by their users , while leaving the delta, data and metadata management to hive itself. This allows for "loose, but powerful" replication of data and metadata. This approach can, in theory, also enable potential third party data management/movement solutions in the future that want to integrate their own warehouse or replication systems with hive through this mechanism.


Sushanth Sowmyan is an Apache Hive committer and PMC member, and a long time warehouse and metadata systems developer that spends most of his time oscillating between worrying about backward compatibility and being worked up about changing things to do it "the right way". He currently works at Hortonworks in their data query team.