Global Big Data Conference

Industry News Details

COVID-19 made your data set worthless. Now what? Posted on : Sep 12 - 2020

The COVID-19 pandemic has perplexed data scientists and creators of machine learning tools as the sudden and major change in consumer behavior has made predictions based on historical data nearly useless. There is also very little point in trying to train new prediction models during the crisis, as one simply cannot predict chaos. While these challenges could shake our perception of what artificial intelligence really is (and is not), they might also foster the development of tools that could automatically adjust.

When it comes to predicting demand or consumer behavior, there is nothing in the historical data that resembles what we see now. Thus, a model based purely on historical data will try to reproduce “what is normal” and is likely to give inaccurate predictions.

Let me give you a simple analogy of the problem that data scientists and machine learning professionals are now experiencing. If you want to predict how long it is going to take to drive from A to B in London next Thursday at 18:00, you can ask a model that looks at historical driving times, and possibly at various scales. For instance, the model might look at the average speed on any day at around 18:00. It might also look at the average speed on a Thursday versus other days in the week, and at the month of April versus other months. The same reasoning can be extended to other time scales as one year, ten years, or whatever is relevant for the quantity you are trying to predict. This will help predict the expected driving time under “normal” conditions. However, if there is major disruption on that particular day, like a football game or a big concert, your travelling time might be significantly affected. That is how we see the current crisis in comparison with normal times.

Perhaps unsurprisingly, many AI and machine learning tools deployed across various businesses – from transport to retail, professional services and the likes – are currently struggling in trying to cope with massive changes in the behavior of both users and the environment. Clearly, one can try making prediction algorithms focus on smaller parts of data. However, it is also pretty obvious that one cannot expect “normal” outcomes and the same quality of predictions as before.

What to do?

There is some good news for data scientists and the likes though. Generally, data science solutions are built on historical data, but current, “extraordinary” data should come in when continually assessing the performance of those existing solutions. If performance starts to drop off consistently, then that can be an indication that the rules have changed.

This performance monitoring is independent of predictive systems for now – it tells us how things are doing, but will not change anything. However, I believe that we are now seeing a major push towards systems that could adjust automatically to the new rules. This is something we can call “adaptive goal-directed behaviour”, which is how we define AI at Satalia. If we can make a system adaptive, then it is going to adjust itself based on that current data when it recognizes performance dropping off. We have aspirations to do this, but we are not there just yet. In the short run, however, we can do the following:

Do not try to train a brand new model from Day 1 of the crisis, it is pointless. You cannot predict chaos;
Gather more data points and try to understand/analyze, how the model is affected by the situation;
If you have data from a previous crisis with similar characteristics, train a model on that data and test it offline to see if it works better;
Make sure your training data is always up to date. Every day, the new day goes into the data and the oldest day goes out. Like a sliding window. The model will then gradually adjust itself;
Shrink the timeline of your dataset as much as possible without affecting your metrics. If you have a very long dataset, it will take too long for it to adjust to the new reality; and
Manage client expectations. Make it clear that noise is making things very hard to predict. Computing KPIs during this time is next to impossible. View More

Get the