Dataloop Drives Labeling Into the DataOps Pipeline Posted on : Oct 17 - 2020

Data is the fuel for machine learning, but the data needs to be accurately labeled for the machines to learn. To that end, data training startup Dataloop yesterday unveiled that it’s received $11 million in Series A funding to build SaaS data pipelines that combine human supervision of the data annotation process, along with data management capabilities.

Today’s computer vision models are extremely powerful, and the ones based on deep learning approaches can exceed human capabilities. From self-driving cars navigating in the world to programs that can accurate diagnose diseases in MRI images, the potential uses for Ais built upon convolutional neural networks are astonishingly wide.

However, there’s a catch (there always is). The deep learning models work best when presented with lots of labeled data. However, because of the amount of data that deep learning uses, spending human cycles to curate all that data is extremely expensive, and in fact is one of the biggest bottlenecks preventing more widespread adoption of AI.

For example, a 2019 study by Dimensional Research concluded that “96% of companies surveyed stated they have run into training-related problems with data quality, labeling required to train the AI, and building model confidence.” That’s why 70% of the companies it surveyed relied on external firms to supply the data collection, labeling, and development services.

That’s essentially the market that Dataloop is hoping to fill. Dataloop is an Israeli company that was founded in 2017 with a focus on automating the data annotation process, primarily for computer vision projects but also for ones involving audio files.

Dataloop has developed a SaaS application that helps companies automate this data labeling process, and functions as a hub for uniting data scientists, data engineers, and the data labelers themselves.

Humans are not required for all labeling activities. Other machine learning algorithms can sometimes provide the necessary level of accuracy in data labeling. Dataloop has what it dubs “AI-assisted auto-annotation capabilities” that are built into the offering for supplying the data to train downstream vision model.

However, AI cannot be fully trusted to accurately label the images, and that's why Dataloop keeps humans in the loop: to oversee the AI processes and step in when needed. "We strongly believe that with humans in the loop, algorithms can make more accurate and reliable predictions, which ultimately leads to more accurate machine learning capabilities," the company tells Datanami.