Industry News Details

Bigeye Spawns Automated Data Quality Monitoring from Uber Roots Posted on : Apr 15 - 2021

Having high quality data is essential to having high quality analytics and accurate machine learning algorithms. But as data volumes grow, it becomes increasingly difficult to keep tabs on data quality using traditional rules-based approaches. That’s why a startup founded by former Uber engineers called Bigeye is relying on ML and statistics to automatically monitor data quality.

When he was in charge of Uber’s metadata team, Kyle Kirwan oversaw the development of a rules-based approach to testing the quality of the company’s data.

“Someone would come into that tool, define a set of rules about data, and then we’d run the rules on a schedule. And they’d get an alert if the rules are violated,” says Kirwan, who is Bigeye’s co-founder and CEO. “Think of unit tests or integration tests for software.”

The folks at Uber liked the data quality test harness, namely because it was a proactive response to what was previously a problem that people could only react to, Kirwan says. However, because the rules were written manually, it required a lot of human intervention to keep Uber’s massive data warehouse filled with fresh, clean data.

That difficulty gave rise to a new approach at Uber that added automation to the mix. By using anomaly detection techniques, Uber’s Data Quality Monitor (DQM) tool can automatically flag problems as they pop up in its multi-petabyte warehouse. View More