LILA creator presented a compact AI architecture and challenged Sam Altman's approach

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

The creator of Sovereign-Lila-Leech published the LILA manifesto and positioned the architecture against the corporate race for scale. The project's idea is…

Hamidun News Editorial

AI monitoring · Habr AI

May 2, 2026· 2 min

AI-processed from Habr AI; edited by Hamidun News

LILA creator presented a compact AI architecture and challenged Sam Altman's approach — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

The author of the Sovereign-Lila-Leech project has published the LILA manifesto and claimed that a geometric architecture based on the Leech lattice could radically reduce the cost of language models. The main bet is not on new data centers, but on compact models that can be run offline on client devices.

What is the LILA idea

At the core of the project is an attempt to embed a fixed geometry of the Leech lattice — one of the most well-known objects in 24-dimensional mathematics — into a transformer. According to the repository description, the standard trainable query and key projections are replaced by a frozen orthogonal kernel, and an additional loss function pulls the hidden representations toward selected lattice directions. The author presents this as a way to move away from the brute force approach and make model behavior more interpretable.

"Mathematics should not be computed — it should exist."

In the article itself, this idea is framed as a manifesto against "corporate AI": the author contrasts a few lines of code and geometric priority with the large budgets of OpenAI and Qualcomm. But if you strip away the journalistic layer, the thesis is clear: not all efficiency in AI has to come from scaling up parameters, computations, and infrastructure; some of the gains can be sought at the level of the model's structure itself.

What the project promises

The manifesto claims maximally ambitious effects: 44.9x geometric compression, native 2-bit quantization, and the ability to run models of up to 4 billion parameters on mobile devices. The author also emphasizes complete offline autonomy, zero server inference costs, and user data privacy. In other words, this is not just another architectural idea, but a bid for a new stack for edge AI.

At a more down-to-earth level, the project's public materials look like this:

the GitHub repository describes a 20-million parameter baseline model, Leech-Lila;
the README mentions training on TinyStories and FineWeb-edu on a single NVIDIA T4 in Google Colab;
the code and weights are published open source under AGPLv3;
a preprint describing the approach is posted on Zenodo;
the project is positioned as research code suitable for experiments with geometric inductive biases.

The README also specifies more concrete benchmarks: a stable rank of the first layer of 8.55, an effective capacity of about 440 million parameters, and a result of 0.129 bits-per-character on TinyStories. The author interprets this as evidence that geometric regularization can yield a disproportionate gain even on a small model. But for now, these numbers relate to a compact research setup, not a mass-market product.

Where questions arise

The main nuance is that the manifesto and the project's technical description speak with different levels of confidence. The article claims 44.9x compression and an almost revolutionary shift for mobile AI, while the README of the repository itself mentions 22x compression, metrics on TinyStories, and the explicit status of Proof-of-Concept / Research Code. This does not make the work pointless, but it shows that industrial validation and reproducible comparisons are still a long way off.

The second point concerns the comparison with Qualcomm. The author references a Qualcomm AI Research preprint from March 11, 2026, about vector quantization on the Leech lattice, and interprets it as belated recognition of the power of this mathematics. But the two works have different goals: Qualcomm writes about LLM compression via vector quantization, while LILA proposes fixing the geometry inside the attention mechanism. Therefore, it is too early to declare a clear victory of one approach over the other. The project currently has no independent benchmarks, peer review, or comparisons on large practical tasks.

What this means

The LILA story is interesting not as a proven "killer" of large models, but as a signal that an experimental race around edge AI and architectural efficiency is starting again. If such ideas are confirmed through reproducible tests, the market will get more local models with less dependence on the cloud. If not, the manifesto still marks an important shift: the debate in AI is no longer only about size, but also about the mathematics of a model's internal structure.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation