Global Big Data Conference

Industry News Details

Google taps old news archives and AI to predict flash floods. Posted on : Mar 12 - 2026

Flash floods are among the world’s most dangerous weather disasters, causing more than 5,000 deaths every year. Yet they’re also notoriously hard to forecast. Google believes it may have found an unusual solution: analyzing old news reports with AI.

Although scientists have collected vast amounts of weather data, flash floods are extremely short-lived and localized. Unlike temperature trends or river flows—both of which are regularly monitored—flash floods often happen too quickly and in too small an area to be captured consistently. That lack of reliable data has made it difficult for modern deep-learning systems, which increasingly power weather forecasts, to predict them accurately.

To address the gap, Google researchers turned to Gemini, the company’s large language model. They used it to scan roughly 5 million news articles worldwide, identifying reports describing about 2.6 million separate flood events. Those reports were converted into a geo-tagged time-series dataset called “Groundsource.” According to Google Research product manager Gila Loike, it’s the first time the company has used language models in this way. The dataset and research were released publicly on Thursday.

Using Groundsource as a real-world reference, the team then trained a forecasting system built on a Long Short-Term Memory neural network (LSTM). The model processes global weather forecasts and estimates the probability of flash floods in specific areas.

The resulting model is now being used on Google’s Flood Hub platform, which highlights flash-flood risks for urban regions across 150 countries. The company is also sharing the data with emergency response organizations around the world. António José Beleza, an emergency response official with the Southern African Development Community, said testing the model helped his team react to flooding events more quickly.

The system still has limitations. Its forecasts cover relatively large areas—about 20 square kilometers each—making it less detailed than some national systems. It’s also less precise than alerts from the National Weather Service in the United States, partly because Google’s model doesn’t use local radar data that allows meteorologists to track rainfall in real time.

However, the project is designed to work in regions where sophisticated weather-monitoring infrastructure is unavailable. Many countries lack the resources to maintain dense radar networks or long-term meteorological records.

By compiling millions of flood reports, the Groundsource dataset helps fill those gaps, said Juliet Rothenberg, a program manager on Google’s resilience team. The approach allows researchers to infer risks in places where traditional weather data is limited.

Rothenberg added that using large language models to convert qualitative text into quantitative datasets could also help track other hard-to-measure phenomena—such as heat waves or landslides.

Marshall Moutenot, CEO of Upstream Tech, which develops AI models to forecast river flows for clients like hydropower companies, said the project reflects a broader push to gather better training data for machine-learning weather models. Moutenot also co-founded dynamical.org, a group focused on assembling machine-learning-ready weather datasets.

“Data scarcity is one of the toughest problems in geophysics,” Moutenot said. “There’s an overwhelming amount of Earth data, but when it comes to verifying predictions, reliable ground truth is often missing. This was a particularly creative way to bridge that gap.”

Get the