Back

 Industry News Details

 
Big Data Governance – Metadata Is the Key Posted on : Nov 30 - 2018

A new approach to data governance is needed in the age of big data, when data is scattered throughout the enterprise in many formats, and coming from many sources.

As the volume, variety and velocity of available data all continue to grow at astonishing rates, businesses face two urgent challenges: how to uncover actionable insights within this data, and how to protect it. Both of these challenges depend directly on a high level of data governance.

The Hadoop ecosystem can provide that level of governance using a metadata approach, ideally on a single data platform.

A new approach to governance is needed for several reasons. In the age of big data, data is scattered throughout the enterprise. It’s in structured, unstructured, semi-structured and various other formats.  Furthermore, the sources of the data are not under the control of the teams that need to manage it.

In this environment, data governance includes three important goals:

Maintaining the quality of the data

Implementing access control and other data security measures

Capturing the metadata of datasets to support security efforts and facilitate end-user data consumption

Solutions within the Hadoop Ecosystem

One way to approach big data governance in a Hadoop environment is through data tagging. In this approach, the metadata that will govern the data’s use is embedded with that data as it passes through various enterprise systems. Furthermore, this metadata is enhanced to include information beyond common attributes like filesize, permissions, modification dates and so on. For example, it might include business metadata that would help a data scientist evaluate its usefulness in a particular predictive model.

Finally, unlike enterprise data itself, metadata can be centralized on a single platform.

The standard Hadoop Distributed Filing System HDFS has an extended attributes capability that allows enriched metadata, but it isn’t always adequate for big data. View More