Global Big Data Conference

Industry News Details

The Biggest Mistakes Made by Data Scientists Posted on : Nov 22 - 2019

While the tools may change, the mistakes stay the same. Here are four common issues that IT leaders should be aware of when managing data science teams.

In 2019, companies looking to gain an edge on competitors and insight into customers and trends have come to rely more heavily on data scientists to inform their business decisions. A good data scientist is invaluable to a company with any online presence. They will assess and interpret complex information and build out machine learning algorithms.

Data volume keeps growing, and the amount of skill and effort needed to create data-driven initiatives is certainly keeping pace with that growth. Mistakes can produce huge consequences and, while the tools may change, the mistakes stay the same. Over the course of my career I’ve seen every permutation of these common mistakes, and my hope here is to help you identify and avoid them within your own teams.

Mistake #1: Lack of coding skills

This one may seem obvious, but you would be amazed at the number of people who feel data science is a career completely removed from the practice of coding. The central tenet of data science is, and really has always been, building a model with a long script. The quality of that script (or lack thereof) has endless consequences, from scalability to robustness of the model when it goes in production.

An excellent data scientist must also be a good programmer. My rule is: a senior data scientist must possess a mid-level software engineer’s coding skill and a mid-level data scientist should be on par with a junior software engineer.

Mistake #2: Lack of defensive mindset

The adage goes “the best offense is a good defense” and, while sports rarely overlap with code, in this case the saying is apt. Teams need to emphasize the mindset: “How wrong can the model be on a bad day?”

A single mistake can become a financial and legal consequence to the company. If you don’t test and retest your code with a defensive mindset, it will certainly have errors.

In machine learning, people use performance metrics like precision, RMSE, and MAE. Those are averages and do not act as a replacement for defensive testing. View More

Get the