The Wall for LLMs: Why Skeptics Got It Wrong Again
Тезис о том, что обучение нейросетей замедляется из-за нехватки текстов, стал общим местом. Но это лишь одна из шести осей развития. Пока критики хоронят LLM, и
AI-processed from Habr AI; edited by Hamidun News
Every six months, the AI industry enters a period of "great despondency." First we were told that GPT-3 was the limit and there's nowhere else to go. Then they assured us that the next breakthrough requires trillions of tokens that simply don't exist on the internet. Now a new trend: the data has run out, the transformer architecture has exhausted itself, it's time to pack up and leave. This sounds solid and even logical if you view the world through a keyhole. But if you've been following the game for longer than one hype cycle, you understand — we haven't hit a wall, we've simply reached the end of one straight line and are turning onto a high-speed highway.
The problem with skeptics is that they think one-dimensionally. For them, progress means pouring more text into a model and getting more intelligence. Yes, text data for classical supervised learning is indeed running out. The entire internet has already been "digested" by neural networks. But learning from data is just one of six axes along which progress moves. While someone mourns empty libraries, engineers at OpenAI, Google, and Anthropic are actively turning the other five levers, which somehow get forgotten in public discussions.
The first, and perhaps most important lever today, is inference-time compute. Look at the o1 family of models. They don't just output an answer, they "think" before writing the first letter. This changes the paradigm: now you don't necessarily need to make a model ten times larger to make it smarter. You can make it think longer about the task. It's like in life: an intelligent person is not someone who has read the most books, but someone who knows how to deeply analyze information. We're transitioning from quantity of reading to quality of comprehension.
The second axis is algorithmic efficiency. Remember how everyone complained about the resource hunger of transformers? Now architectures like Mamba and Mixture of Experts (MoE) are taking center stage. They deliver the same power at much lower resource costs. We're learning to build more sophisticated engines rather than just increasing the fuel tank volume. Add to this the third axis — multimodality. Models stop being just "text readers." They begin to see, hear, and understand the physical world. When AI learns from video and audio, the concept of "text has run out" loses all meaning. The world is an infinite stream of data that we're only beginning to explore.
The fourth and fifth factors are tool use and self-improvement through self-play. Remember how AlphaGo defeated the world champion at Go. It didn't learn only from human games, it played against itself millions of times. This approach is now coming to LLMs. Models are starting to generate synthetic data, check it for logic, and learn from their own mistakes. If AI can create tasks and solve them itself, it no longer needs humans as the sole source of knowledge. This closes the learning loop and makes it practically infinite.
We are at a point where old metrics of progress — the number of parameters and dataset volume — stop being primary. An era of architectural flexibility and intellectual depth has arrived. Those who today cry out about "exhausted technology" simply haven't noticed that the rules of the game have changed. We haven't reached the ceiling, we've simply finished the foundation and started building floors. And judging by the pace of agent and tool deployment, these floors will grow much faster than anyone expected.
The bottom line: Forget about "data shortage." The real battle now is who will teach the model to think longer and more efficiently, not who will feed it more terabytes from Reddit.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.