Google DeepMind adapted Perch 2.0: the bird-song model recognizes whale calls
Google DeepMind found an unexpected use for Perch 2.0: the model, trained on birdsong, confidently recognizes whale signals as well. In tests on three marine…
AI-processed from IEEE Spectrum AI; edited by Hamidun News
Google DeepMind found an unexpected way to study the ocean: the Perch 2.0 model, created to recognize bird songs and other terrestrial animal sounds, confidently handles whale vocalizations too. This could reduce the time spent developing separate marine models and accelerate acoustic monitoring of rare populations.
How the Perch 2.0 model was tested
Perch 2.0 is a foundational bioacoustic model trained on millions of recordings of birds, amphibians, insects, and mammals. Initially, it was developed not for the ocean, but for analyzing terrestrial soundscapes.
However, the teams at Google DeepMind and Google Research decided to test whether they could reuse the existing foundation instead of building a new system from scratch for whales. The logic is simple: if the foundational model transfers knowledge between different types of signals, scientists won't have to spend as much computation and time developing a separate system. To verify this, the team took three marine audio datasets containing whale vocalizations and other underwater noise.
Each five-second fragment was converted into a spectrogram—a visual map of frequencies and sound intensity over time. Perch 2.0 then transformed this data into embeddings, that is, compact sets of features that can distinguish, for example, the whistle of an orca from the call of a humpback whale.
After this, researchers trained a simple logistic classifier on just a few examples: from four to 32 embeddings per dataset. Even with such a small number of examples, quality was high and improved with the addition of more data.
Why transfer learning worked
The key idea here is transfer learning. A model first learns to extract general acoustic patterns from a vast dataset, and then these learned features are applied to a different but related task. In the case of Perch 2.0, the transfer is particularly unexpected: birds sing in the air, while whales exchange signals underwater. Yet the model appears to capture not just the sound transmission medium, but more subtle patterns—the shape of whistles, frequency dynamics, signal duration, and microstructure.
"We train this model to find small details in sound landscapes."
Researchers offer several explanations. Birds and marine mammals may have evolutionarily similar sound production mechanisms. Additionally, large models trained on diverse data often perform well outside their original domain. Finally, recognizing bird vocalizations is itself extremely complex: the model is forced to notice the tiniest differences. This likely helps it underwater. According to the team, whistles from some orca populations even fall within spectral ranges similar to many bird calls.
Why this matters for biologists
For ocean researchers, this result is important not only as an elegant demonstration. In bioacoustics, researchers constantly discover new signal types, and some underwater noises still lack confident classification. If instead of creating a separate model for each species, researchers can take a powerful foundational system and quickly fine-tune a lightweight classifier, the research cycle becomes noticeably shorter. This is particularly useful for passive acoustic monitoring, where scientists listen to vast archives of recordings from buoys, hydrophones, and autonomous stations for months.
- Faster deployment of new models for specific whale populations
- Reduced training costs and architecture search overhead
- Performance even with very few labeled examples
- More flexible search for rare and undescribed signal types
It's also important that Perch 2.0 was compared not only to Google's previous whale model, but also to other bioacoustic models for birds, animals, and coral reefs. In these comparisons, it was either the best or second-best in quality. So we're not talking about a random lucky test, but a strong result against specialized alternatives. For conservation projects, this is a good signal: one foundational audio tool can work across multiple ecosystems.
What this means
The Perch 2.0 story shows that foundational AI models are beginning to benefit not just chatbots and content generation, but field science as well. If transfer learning works between birds and whales, biologists have a chance to monitor population status faster, notice changes in animal behavior, and better protect vulnerable species.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.