Global Big Data Conference

Industry News Details

Making ML Explainable Again Posted on : Feb 02 - 2019

Machine learning may seem like a mysterious creation to the average consumer, but the truth is we’re surrounded by it every day. ML algorithms power search results, monitor medical data, and impact our admission to schools, jobs, and even jail. Despite our proximity to machine learning algorithms, explaining how they work can be a difficult task, even for the experts who designed them.

In the early days of machine learning, algorithms were relatively straightforward, if not always as accurate as we’d like them to be. As research into machine learning progressed over the decades, the accuracy increased, and so did the complexity. Since the techniques were largely confined to academic research and some areas of industrial automation, it didn’t impact the average Joe very much.

But in the past decade, machine learning has exploded into the real world. Armed with huge amounts of consumer-generated data from the Web and mobile devices, organizations are flush with information describing where we go, what we do, and how we do it — both in the physical and digital worlds.

At the same time, the advent of deep learning is giving us unparalleled accuracy for some types of inference problems, such as identifying objects in images and understanding linguistic connections. But deep learning has also brought higher levels of complexity, and that – combined with growing concerns about the privacy and security of consumer data – is giving some practitioners pause as they roll out or expand the use of machine learning technology.

Organizations must carefully weigh the advantages and disadvantages of using “black box” deep learning approaches to make predictions on data, says Ryohei Fujimaki, Ph.D., who worked on more than 100 data science projects as a research fellow for Japanese computer giant NEC before NEDC spun Fujimaki and his team last year as an independent software vendor called dotData.

“It’s really up to the customer,” says Fujimaki, who is the CEO and founder of dotData, which is based in Cupertino, California. “There are still some areas that 1% or 2% make a huge difference in return. On the other hand, there are areas — and in particular I believe this is a majority of areas of enterprise data science – where transparency is more important, because at the end of the day, this has to be consumed by a business user and business user always required to understand what is happening behind the scenes.”

White Boxes

dotData develops a “white box” data science platform that it claims can automate a good chunk of the machine learning pipeline, from collecting the data and training models, to model selection and putting the model into production. The software, which runs in Hadoop and uses Spark, leverages supervised and unsupervised machine learning algorithms. View More

Get the