Back

 Industry News Details

 
Why your machine-learning team needs better feature-engineering skills Posted on : Jul 21 - 2018

The skill of feature engineering — crafting data features optimized for machine learning — is as old as data science itself. But it’s a skill I’ve noticed is becoming more and more neglected. The high demand for machine learning has produced a large pool of data scientists who have developed expertise in tools and algorithms but lack the experience and industry-specific domain knowledge that feature engineering requires. And they are trying to compensate for that with better tools and algorithms. However, algorithms are now a commodity and don’t generate corporate IP.

Generic data is becoming commoditized and cloud-based Machine Learning Services (MLaaS) like Amazon ML and Google AutoML now make it possible for even less experienced team members to run data models and get predictions within minutes. As a result, power is shifting to companies that develop an organizational competency in collecting or manufacturing proprietary data — enabled by feature engineering. Simple data acquisition and model building are no longer enough.

Corporate teams can learn a lot from the winners of modeling competitions such as the KDD Cup and Heritage Provider Network Health Prize that have credited feature engineering as a key element in their successes.

Feature engineering techniques

To power feature engineering, data scientists have developed a range of techniques. They can be broadly viewed as:

Contextual transformation. One set of methods involves transforming the individual features from the original set into more contextually meaningful information for each specific model.

For example, when dealing with a categorical feature, ‘unknown’ might communicate special information in the context of a specific situation. However, inside the model it looks like just another category value. In this case a team might want to introduce a new binary feature of ‘has_value’ to separate ‘unknown’ from all other options. For example, a ‘color’ feature would allow an entry of ‘has_color’ for something of unknown color. View More