Speaker "Benoy Antony" Details Back
-
Name
Benoy Antony
-
Company
Dataapps
-
Designation
Founder
Topic
Detect Sensitive Data in Hadoop Clusters
Abstract
Organizations store massive amounts of data in Hadoop clusters. The data may contain sensitive information without sufficient protection. The sensitive information could be Personally Identifiable Information (PII) such as Social Security Numbers or Financial Information like Credit Card Numbers. Organizations need to continuously monitor the presence of sensitive information in Hadoop clusters to meet security and compliance requirements. Detecting sensitive information in a Hadoop Cluster poses challenges due to the massive amount of data and different storage formats. In this presentation, we will understand the methods to detect sensitive data in a Hadoop cluster. We will see how to use Yarn applications to scan large amounts of data for sensitive information. We will identify the best practices to scan data stored in different file formats. Once sensitive data is identified, the data has to be protected. We will review the options available in Hadoop to protect sensitive information.