KernelEvo: Russian framework automates GPU kernel generation with AI

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-03-05. Reading time: 3 min.

The AIRI Institute’s Computational Intelligence team developed KernelEvo, a framework for automatically generating optimized GPU kernels in CUDA and Triton. Ins

Hamidun News Editorial

AI monitoring · Habr AI

2026-03-05· 3 min

AI-processed from Habr AI; edited by Hamidun News

KernelEvo: Russian framework automates GPU kernel generation with AI — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

Writing fast GPU kernels has always been considered the domain of the select few. A narrow circle of engineers capable of juggling memory models, access patterns, and constraints of specific hardware backends determined the pace of high-performance computing development. Russia's AIRI Institute has decided to break this vicious cycle by presenting KernelEvo—a framework that transforms the grueling process of manual GPU kernel optimization into automated search.

The problem that KernelEvo solves is familiar to anyone who has ever tried to squeeze maximum performance out of a graphics accelerator. The classic kernel development cycle looks roughly like this: an engineer writes code, runs it, encounters a compilation error or unexpected runtime behavior, returns to the code, rewrites it, checks again. This iterative process can stretch on for days and weeks, and the result depends directly on the developer's qualifications. Meanwhile, the gains from a well-optimized custom kernel compared to a universal implementation can be colossal—sometimes talking about multiplicative speedup in computations.

The "Computational Intelligence" team from AIRI proposed a fundamentally different approach. Instead of relying on human intuition and expertise, KernelEvo builds an automatic search cycle. The framework takes source code as input and independently searches for efficient implementations on CUDA and Triton—the two main platforms for GPU programming. The key word here is "searches": the system doesn't just generate one code variant, but methodically iterates through the space of possible solutions, testing each for correctness and performance.

Technically, the approach relies on using large language models in the optimization loop. The model generates kernel variants, the system compiles and tests them, feedback results are returned to the model for the next iteration. Essentially, it's the same cycle that a human engineer goes through, but performed automatically and with much faster solution space exploration. Developers note that one optimization task takes approximately a million tokens. If we translate this into the cost of API calls to modern language models, we're talking about quite reasonable amounts—especially when compared to paying for the work time of a highly qualified CUDA engineer.

It's important to understand the context in which KernelEvo emerges. The industry is experiencing a real boom in demand for optimized GPU computing. Training and inference of large neural networks require ever more computational resources, and hardware accelerators are expensive. Every percent of optimization at the kernel level translates into real savings—whether it's model training time, cloud infrastructure costs, or data center energy consumption. At the same time, the shortage of specialists capable of writing efficient low-level GPU code remains one of the industry's main bottlenecks. Automating this process is not just a convenience, but a strategic necessity.

KernelEvo fits into a broader trend that has been gaining momentum over the past year and a half. Several research groups around the world are working on tools that allow language models to optimize low-level code. Google is actively developing similar approaches for its TPUs, and NVIDIA is investing in automating CUDA kernel optimization. However, most of these solutions remain closed and tied to specific ecosystems. The appearance of an open framework from a Russian institute is a notable event because it expands access to such technologies beyond large corporations.

Of course, automatic kernel generation will not completely replace experienced engineers. Complex architectural decisions, non-standard hardware configurations, fundamentally new algorithms—all of this still requires human understanding. But routine optimization, which makes up a significant part of GPU programmers' work, tools like KernelEvo are capable of taking on today. This shifts the engineer's role from coding to task formulation and validation of results—a shift we observe in practically all areas where generative AI arrives.

KernelEvo from AIRI is yet another confirmation that the future of high-performance computing will be determined not only by hardware power, but also by the intelligence of software tools that use that hardware. The framework is still in its early stages, but the approach itself—automatic search for optimal implementations using language models—looks like a direction that will only gain momentum.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation