Global Big Data Conference

Industry News Details

Cloudera Gives Data Scientists More Options for ML Posted on : May 23 - 2018

Cloudera unleashed a collection of new software today that’s geared at accelerating the development and deployment of machine learning programs. In addition to a new release of its Data Science Workbench that lets data scientists deploy ML models as APIs with the push of a button, it released a new iteration of its enterprise suite of software based around Apache Hadoop, Cloudera Enterprise 6.0, that offers first-class support for GPUs, among other new features.

Machine learning has always been one of the use cases that Cloudera supports with its open source software, Cloudera Distribution of Hadoop (CDH), as well as its flagship offering, Cloudera Enterprise, which now includes Apache Spark. The capability to detect patterns and anomalies in large data sets and to build business processes that operationalize them is the defining feature of “big data.”

But as much as machine learning has always been a “thing” in the Hadoop world, something changed in Cloudera’s customer base recently that’s resulted in a sudden surge in interest in machine learning. That’s according to Matt Brandwein, a product manager with the Palo Alto, California company.

“We’ve seen a dramatic uptick in data science interest over the last 18 months,” Brandwein tells. “There’s been a latent demand that we’ve, I think, finally tapped into.”

About a year ago, Cloudera launched Data Science Workbench, which gave data scientists a Web-based data science notebook for building Python and R-based machine learning models that could utilize the data and processing resources of CDH or Cloudera Enterprise clusters. The product was an instant hit, and has become Cloudera’s top-selling piece of software with hundreds of paying customers, Brandwein says.

With version 1.6 the company is adding two new features to Data Science Workbench, dubbed experiments and models, that take the product to the next level.

The experiments features enables data scientists to try many different combination of variables — including data sets, data features, model libraries, algorithms, hyperparameter settings, and processor type. Keeping track of all of these changes would be very difficult to do, especially if the data scientist is working within a larger team. Data Science Workbench makes this task easier by automatically logging all the changes and maintaining it as a knowledge base, which helps with model lineage, auditability, and collaboration.

The new model feature, meanwhile, allows a data scientist to easily deploy her experiments as APIs, without involving any data engineering resources to make that happen. View More

Get the