Google DeepMind Takes Silver: AI Solves Olympiad Problems, But at What Cost?

While the whole world argues whether ChatGPT will ever stop making mistakes in cake recipes, Google DeepMind decided to take a swing at the sacred — the International Mathematical Olympiad (IMO). The results of AlphaProof and AlphaGeometry 2 made the industry shudder: the systems solved four problems out of six, which corresponds to the level of a silver medal winner. It seemed like the moment of singularity, when silicon finally defeated carbon in pure logic. But if you look closely at the details, the triumph looks more like a heroic victory over circumstances than a casual walk. Mathematics has always been the Achilles' heel of language models. Ordinary LLMs work on probabilities, predicting the next word, which in the strict world of proofs leads to inevitable hallucinations. To solve this problem, DeepMind engineers took the path of hybridization.

Khamidun Zhemal

AI monitoring · Jiqizhixin (机器之心)

Feb 3, 2026· 2 min

AI-processed from Jiqizhixin (机器之心); edited by Hamidun News

Google DeepMind Takes Silver: AI Solves Olympiad Problems, But at What Cost? — Source: Jiqizhixin (机器之心). Collage: Hamidun News.

◐ Listen to article

But if you look closely at the details, the triumph looks more like a heroic victory over circumstances than a casual walk. Mathematics has always been the Achilles' heel of language models. Ordinary LLMs work on probabilities, predicting the next word, which in the strict world of proofs leads to inevitable hallucinations.

To solve this problem, DeepMind engineers took the path of hybridization. AlphaProof combines the flexibility of reinforcement learning with the harsh discipline of the Lean formal verification language. This allows the artificial intelligence not just to guess the answer, but to write code that verifies itself at every logical step.

AlphaGeometry 2, in turn, received a powerful injection in the form of the Gemini model, which allowed it to crack geometric puzzles tens of times faster than its predecessor. However, behind the shine of the silver medal lies a harsh reality. While live Olympiad participants solved problems in two four-and-a-half-hour sessions, the artificial intelligence required several days of continuous computation for some proofs.

This highlights the main problem of modern systems: they are monumentally inefficient compared to the human brain. We see a classic example of picking low-hanging fruit. Yes, AI has learned to work within the strict frameworks of formal languages, but it still spends colossal computational resources where a talented teenager would need only a sheet of paper and a couple of hours of reflection.

The gap in energy efficiency between biological and digital intelligence remains immense. Why is this important right now? We are witnessing a fundamental shift in AI development strategy.

The industry has realized that simply scaling data no longer provides explosive quality gains on truly complex tasks. The future belongs to systems that can reason and verify their conclusions. Google is essentially creating what's called System 2 for AI — slow, deliberate thinking that complements the fast and intuitive System 1 of ordinary chatbots.

This is critical not only for pure mathematics, but also for programming, cybersecurity, and the design of complex engineering systems, where a single-bit error can lead to disaster. Nevertheless, Demis Hassabis and his team honestly acknowledge: the process of training and operating these models remains painful. For AlphaProof to work, problems must be translated into the Lean language manually — the AI is not yet able to independently interpret problem conditions in natural language with sufficient accuracy.

We have acquired a powerful tool that still requires an entire army of translator engineers to function. This is reminiscent of the first computers, which occupied entire halls and required punch cards. The potential is enormous, but it will take years before a mathematician in your pocket capable of making real-time discoveries emerges.

The bottom line: Google DeepMind has proven that AI can handle high-order logic without errors, but the price of this infallibility is still exorbitant. Will the company be able to radically speed up the thinking of its systems by the end of the year?

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →

Google DeepMind Takes Silver: AI Solves Olympiad Problems, But at What Cost?

Need AI working inside your business — not just in your newsfeed?

The AI world, distilled — once a week