Back

 Industry News Details

 
Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein Posted on : Aug 16 - 2018

Machine learning models — artificial intelligence (AI) that identifies relationships among hundreds, thousands, or even millions of data points — are rarely easy to architect. Data scientists spend weeks and months not only preprocessing the data on which the models are to be trained, but extracting useful features (i.e., the data types) from that data, narrowing down algorithms, and ultimately building (or attempting to build) a system that performs well not just within the confines of a lab, but in the real world.

Salesforce’s new toolkit aims to ease that burden somewhat. On GitHub today, the San Francisco-based cloud computing company published TransmogrifAI, an automated machine learning library for structured data — the kind of searchable, neatly categorized data found in spreadsheets and databases — that performs feature engineering, feature selection, and model training in just three lines of code.

It’s written in Scala and built on top of Apache Spark (some of the same technologies that power Salesforce AI platform Einstein) and was designed from the ground up for scalability. To that end, it can process datasets ranging from dozens to millions of rows and run on clustered machines on top of Spark or an off-the-shelf laptop.

Mayukh Bhaowal, director of product management for Salesforce Einstein, told VentureBeat in a phone interview that TransmogrifAI essentially transforms raw datasets into custom models. It’s the evolution of Salesforce’s in-house machine learning library, which allowed the Einstein team to deploy custom models for enterprise clients in just hours.

“It’s informed by what our data scientists learned while building Einstein,” Bhaowal explained. Chief among those lessons: Custom-built models beat global, pretrained models. “If you’re using the same model to make predictions for a Fortune 500 company and a mom and pop shop, you’ll have a hard time finding the right pattern.”

Machine learning made easy

TransmogrifAI offers a three-step workflow.

First is feature inference and automated feature selection. It’s a crucial part of model training, as selecting the wrong features could result in an overly optimistic, inaccurate, or biased model.

Using TransmogrifAI, users specify a schema for their data, which the library uses to extract features automatically (such as phone numbers and zip codes, for example). It also performs statistical tests, automatically cataloging text fields with low cardinality — i.e., a small number of elements — and throwing out features with little-to-no predictive power, or those that are likely to result in hindsight bias (the tendency to overestimate an event’s predictability) and other unwanted signals. View More