Speaker "Akanksha Devkar" Details Back



Big Data : The Silver Bullet in Machine Learning?


With today's extremely progressive technological advancements, it is necessary to question our approach in building highly accurate Deep Learning models with Big Data.
The Machine learning community has always looked at Big Data as a gold mine; however, because most applications are still built using a supervised learning approach, those large volumes of data need to be labeled. This is no easy task as most labeling is still done manually which can be referred to as the "Big Data Labeling Crisis".
No labeling process can ever be 100% accurate even if the data is perfect, just like in the case of models. We need to focus our attention towards building a perfect data and making it concise yet  impacting towards achieving a greatly accurate model.
Traditionally we have believed that the more the data, the lesser the impact of noise on the model performance. But, there is a huge financial as well as time burden of trying to collect this kind of big training data.  At Alectio, we are working towards sustainable machine learning and we have carried a series of studies aiming at understanding the tradeoff between labeling quality and size of the training set in the context of classification, as well as the impact of such noise on the confusion matrix.
In particular, we show that the maximum acceptable amount of noise that a model can sustain without impairing the accuracy differs from class to class. We also offer thoughts on how our findings should impact the relative amount of data across classes helping to create an efficient training data thereby helping significantly reduce the labeling budget and efforts.


Akanksha is a machine learning engineer at Alectio working on developing active learning models. She is an amateur astronomer and roboticist. Having participated in a lot of robotics contest, she had a keen interest in working on a DARPA Grand challenge kind of project, which resulted in her using deep learning techniques to develop software for self-driving cars during her graduate school at Worcester Polytechnic Institute. While at it, she would typically have to mine a lot of image data to get important features out of them. Her interest in active learning was piqued when she learnt that it can be used to reduce the data and computation power. She loves having thought experiments and deep discussions about latest tech. She is passionate about technology and believes in the promise of sustainable machine learning.