Google DeepMind Enables LLM to Rewrite Game Theory Algorithms and Surpass Experts
Google DeepMind applied AlphaEvolve to algorithms for games with incomplete information, such as poker. The LLM-based system rewrote code for two key…
AI-processed from MarkTechPost; edited by Hamidun News
Google DeepMind showed that a language model can not only help researchers write code, but independently search for new algorithmic ideas in game theory and outperform solutions that humans have refined for years. This concerns Multi-Agent Reinforcement Learning for games with imperfect information — situations where participants take turns and cannot see each other's hidden data, as in poker. In such tasks, algorithm quality often depends not only on basic theory, but on numerous engineering details: how to accumulate regret, how to discount old signals, when to start averaging strategy, and what method to use for finding equilibrium.
Usually all of this is selected manually through intuition, a series of hypotheses, and lengthy experiments. In a preprint published on arXiv on February 18, 2026, the DeepMind team proposed delegating this work to AlphaEvolve — an evolutionary agent for rewriting code that uses LLM and automatic quality checking for each new version. In this work, AlphaEvolve was applied to two classical algorithm families: CFR, that is Counterfactual Regret Minimization, and PSRO, Policy Space Response Oracles.
For experiments, they used the OpenSpiel framework, and quality was evaluated by exploitability — a metric that shows how much a found strategy can be exploited by the opponent's best response. An important point: researchers were not limited to hyperparameter tuning. The system changed the logic of Python code itself, which is responsible for accumulating the regret signal, building the current policy, and averaging strategies.
The work directly indicates Gemini 2.5 Pro as the model on which this search loop was built. For the CFR family, the system found a new variant called VAD-CFR, Volatility-Adaptive Discounted CFR.
Its idea is that the algorithm doesn't use fixed rules for forgetting old information, but looks at learning volatility and more strongly discounts history during unstable moments. Additionally, AlphaEvolve added asymmetric enhancement of positive instantaneous regret signals with a coefficient of 1.1 and an unexpected rule for averaging: don't start accumulating average policy until the 500th iteration.
This is particularly interesting because the evaluation horizon was 1000 iterations, and the 500 threshold was derived by the system itself, without explicit instruction in the prompt. On a full set of 11 games, VAD-CFR showed results at or above the best known solutions in 10 out of 11 cases; the only exception was 4-player Kuhn Poker. For PSRO, AlphaEvolve was already searching not for regret update rules, but for a meta-solver that determines probability distribution over a population of strategies.
This resulted in SHOR-PSRO, Smoothed Hybrid Optimistic Regret PSRO. This variant mixes optimistic regret matching with soft distribution over best pure strategies and gradually changes the balance between exploration and convergence to equilibrium during training. In practice, this removes some of the manual tuning that was previously mandatory: researchers no longer need to guess in advance when the system should encourage strategy diversity and when it should more strictly approach equilibrium.
On a full set of 11 games, SHOR-PSRO was at or above the best manual baseline solutions in 8 cases. Separately, it's important that DeepMind checked not only fit to training examples. Both found schemes first evolved on four games, including 3-player Kuhn Poker, 2-player Leduc Poker, 4-card Goofspiel, and 5-sided Liar's Dice, and then without additional retuning were tested on larger and previously unseen variants of the same problem classes.
This is stronger than typical demonstration on one or two toy environments: it shows at least basic ability of algorithms to generalize beyond the specific set on which the search was conducted. The main conclusion is simple: LLM is beginning to automate not only code writing, but algorithm design itself. For researchers, this is a shift in role — less manual enumeration of heuristics, more setting metrics, constraints, and verification systems.
At the same time, the work doesn't prove that the model can universally invent anything: success here relies on a clear evaluation function and domains where solution quality can be rigorously verified. But even with this caveat, the result looks serious: DeepMind showed that in narrow, formalizable domains, machines can already find moves that experts have not discovered manually.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.