Google's AI outperformed Olympiad champions in the FirstProof math test

Q: What is the source?

Originally published on Jiqizhixin (机器之心). Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-02-26. Reading time: 3 min.

Google DeepMind has made a major advance in automated theorem proving. A new AI model set a record in the FirstProof math benchmark, whose difficulty is compara

Hamidun News Editorial

AI monitoring · Jiqizhixin (机器之心)

2026-02-26· 2 min

AI-processed from Jiqizhixin (机器之心); edited by Hamidun News

Google's AI outperformed Olympiad champions in the FirstProof math test — Source: Jiqizhixin (机器之心). Collage: Hamidun News.

◐ Listen to article

Mathematics has always been considered the last bastion of human intelligence — a domain where intuition, creative thinking, and rigorous logic intertwine so tightly that reproducing this process with a machine seemed almost a philosophical task. However, the Google DeepMind team has just demonstrated that this bastion is rapidly losing ground. Their artificial intelligence system has set a new record on the FirstProof test — one of the most rigorous tests for automated theorem provers, whose tasks are comparable in complexity to problems from the International Mathematical Olympiad.

The winners of these competitions are the elite of world mathematics, the cream of student and school science. Now they have a competitor who needs no sleep and knows nothing of mathematical anxiety before a blank page.

To assess the scale of what has occurred, it is important to understand how FirstProof differs from familiar academic tests. It is not a competition for guessing answers nor a race in arithmetic speed. FirstProof requires the system not just to name the correct result, but to construct a formally verified proof — a chain of logical steps, each of which can be automatically checked and admits no ambiguity. This is precisely where most language models have traditionally stumbled: they could reason plausibly, but not flawlessly. The gap between "almost correct" and "mathematically proven" in this domain is enormous.

The key technical solution that enabled the breakthrough was the integration of two fundamentally different architectural approaches. A language model — capable of flexible, heuristic thinking — was coupled with a system of formal verification, playing the role of an uncompromising arbiter. The first generates hypotheses, proof strategies, intermediate steps. The second instantly rejects logically unsound chains. The result is something resembling a symbiosis of a creative mathematician and a meticulous reviewer working in real time. Researchers have long explored such an approach, but it was DeepMind that managed to find the scale and architecture at which the two systems began to reinforce rather than hinder each other.

The significance of this achievement extends far beyond academic rankings. Automatic theorem proving is a fundamental tool demanded in many different fields. In software engineering, formal verification allows mathematically guaranteeing code correctness — especially critical for systems managing airplanes, medical devices, or financial infrastructure. In cryptography, it confirms the reliability of security protocols. In pure mathematics, such systems can help researchers verify complex constructions that would take years of manual verification. Until now, all these applications have been constrained by one limitation: existing tools required enormous expert effort to "translate" mathematical ideas into formal language. AI capable of working independently at this level fundamentally changes the equation.

For the broader industry, this result serves as an important signal about the direction of development. After several years of dominance by language models capable of writing and reasoning convincingly, but often making elementary logical errors, researchers are increasingly seeking hybrid architectures where neural networks work in tandem with deterministic verifiers. DeepMind's result confirms: this path works, and works impressively. OpenAI, Anthropic, and academic laboratories around the world are conducting similar research, but it is Google that today sets the standard in the most formalized of mathematical tests.

Of course, a benchmark victory does not mean that AI is ready to replace mathematicians — even those of Olympiad caliber. Posing new problems, choosing research directions, the intuitive leap to the correct hypothesis — all of this remains firmly in human territory. But the boundary is shifting steadily. What Google DeepMind demonstrated in FirstProof is not an imitation of mathematical thinking, but its functional equivalent under strictly defined conditions. And as these conditions expand, the question will shift from "can AI prove theorems" to "which theorems will AI prove first".

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation