Global Big Data Conference

Industry News Details

The Risks of AutoML and How to Avoid Them Posted on : Feb 21 - 2020

When Google Flu Trends was launched in 2009, Google’s chief economist, Hal Varian, explained that search trends could be used to “predict the present.” At the time, the notion that useful patterns and insights could be extracted from large-scale search query data made perfect sense. After all, many users’ digital journeys begin with a search query — including 8 out of 10 people seeking health-related information. So what could possibly go wrong? The answer is infamous in the business and data science communities. Google Flu was shut down in 2015 after the tool’s forecasts overestimated flu levels by nearly 100% relative to data provided by the Centers for Disease Control. Critics were quick to point to the project as the poster-child for big data hubris — the fallacy that inductive reasoning fueled by copious amounts of data can supplant traditional, deductive analysis guided by human hypotheses.

More recently, organizations have shifted towards amplifying predictive power by coupling big data with complex, automated machine learning (autoML). AutoML, which uses machine learning to generate better machine learning, is advertised as affording opportunities to “democratize machine learning” by allowing firms with limited data science expertise to develop analytical pipelines capable of solving sophisticated business problems. In a Kaggle prediction competition held just a few months back, an autoML engine pitted against some of the best data scientists in the world finished second after leading most of the way. However, these advancements have raised concerns about AI hubris. By commoditizing machine learning for process improvement, autoML once again raises questions about what the interplay between data, models, and human experts should look like. What does all this mean for managing in an AI-enabled world?

In our federally-funded project (with Rick Netemeyer and Donald Adjeroh), we are examining the efficacy of detecting adverse events from large quantities of digital user-generated content. It is critical for companies in many settings to monitor for adverse events related to their products or services — for instance, unknown drug side effects, children’s toy hazards, or issues leading to automobile recalls. The project’s goal is somewhat analogous to Google Flu’s original objective — use machine learning to generate accurate and timely signals for enhanced awareness of these potential adverse events. View More

Get the