Google taught AI to predict floods using old newspaper reports
Google uses old news reports and LLM to predict flash floods. Archival texts are converted into numerical data — solving the shortage of hydrological…
AI-processed from TechCrunch; edited by Hamidun News
Google has developed an unconventional approach to predicting flash floods: instead of expensive sensor infrastructure, the company uses archival news reports as a source of historical data. A language model converts qualitative text descriptions of natural disasters into quantitative metrics suitable for training hydrological models. The problem this method solves is well known to climatologists.
Accurate flood prediction models require multi-year data series on water levels, precipitation, and terrain. But in most developing countries, where floods are most deadly, such infrastructure simply doesn't exist. Sensors cost money, require maintenance and electricity — all of which are in short supply in vulnerable regions.
Yet historical information about floods does exist — in newspaper archives, reports from local publications, and government summaries. The problem is that this data is qualitative: "a severe flood inundated three villages" — this is not a number that a traditional model can work with. Google proposed using an LLM as a translator: the model reads historical text and extracts structured numerical estimates of the scale, duration, and geography of the event from it.
The resulting synthetic numerical series are then used to train a flood prediction model. Essentially, archival journalism becomes a substitute for decades of instrumental measurements. This is a fundamentally new way to address data scarcity in climate tasks — an approach that is potentially applicable far beyond hydrology.
Google is already testing the system in regions of Africa and South Asia, where flood warnings are virtually absent. The company has been developing its Flood Hub initiative since 2023, currently covering more than 80 countries. The new method should expand coverage to territories that previously remained outside the forecast area due to the absence of historical numerical data.
This is a telling example of how LLMs are changing the very structure of scientific data. Previously, the boundary between "having data" and "having no data" was determined by the availability of measuring equipment. Now that boundary can be pushed back using language models that can extract hidden quantitative information from texts written without any scientific intent.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.