Back

 Industry News Details

 
Scale AI releases free lidar data set to power self-driving car development Posted on : May 23 - 2020

High-quality data is the fuel that powers AI algorithms. Without a continual flow of labeled data, bottlenecks can occur and the algorithm will slowly get worse and add risk to the system.

It’s why labeled data is so critical for companies like Zoox,  Cruise and Waymo,  which use it to train machine learning models to develop and deploy autonomous vehicles. That need is what led to the creation of Scale AI, a startup that uses software and people to process and label image, lidar and map data for companies building machine learning algorithms. Companies working on autonomous vehicle technology make up a large swath of Scale’s  customer base, although its platform is also used by Airbnb, Pinterest and OpenAI, among others.

The COVID-19 pandemic has slowed, or even halted, that flow of data as AV companies suspended testing on public roads — the means of collecting billions of images. Scale is hoping to turn the tap back on, and for free.

The company, in collaboration with lidar manufacturer Hesai, launched this week an open-source data set called PandaSet that can be used for training machine learning models for autonomous driving. The data set, which is free and licensed for academic and commercial use, includes data collected using Hesai’s forward-facing PandarGT lidar with image-like resolution, as well as its mechanical spinning lidar known as Pandar64. The data was collected while driving urban areas in San Francisco and Silicon Valley before officials issued stay-at-home orders in the area, according to the company.

“AI and machine learning are incredible technologies with an incredible potential for impact, but also a huge pain in the ass,” Scale CEO and co-founder Alexandr Wang told TechCrunch in a recent interview. “Machine learning is definitely a garbage in, garbage out kind of framework — you really need high-quality data to be able to power these algorithms. It’s why we built Scale and it’s also why we’re using this data set today to help drive forward the industry with an open-source perspective.”

The goal with this lidar data set was to give free access to a dense and content-rich data set, which Wang said was achieved by using two kinds of lidars in complex urban environments filled with cars, bikes, traffic lights and pedestrians.

“The Zoox and the Cruises of the world will often talk about how battle-tested their systems are in these dense urban environments,” Wang said. “We wanted to really expose that to the whole community.” View More