Habr AI→ original

DeepMind and Yann LeCun Push AGI Toward World Models — Why This Concerns More Than LLMs

AGI may arrive not through even more conversational LLMs, but through world models — systems that learn to understand the physical world. The text references…

AI-processed from Habr AI; edited by Hamidun News
DeepMind and Yann LeCun Push AGI Toward World Models — Why This Concerns More Than LLMs
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

The author of the text proposes to view the path to AGI not through yet another leap in conversational models, but through world models — systems that learn to understand the physical world, not just the statistics of words. In this logic, today's AI hallucinations look not like a dead end, but like a raw stage of more general intelligence.

Why Text Alone Isn't Enough

The main criticism of current LLMs is simple: they work great with language, but lack their own experience of interacting with reality. Such systems can confidently describe a cup falling off a table, but not because they "understand" gravity, but because they've seen endless amounts of text about similar situations. The author calls this state a "brain in a vat": the model knows the world only through words, not through cause-and-effect relationships, space, and physics.

From this comes the key thesis: scaling text models alone may not be enough for AGI. If a system can't build an internal model of the world, predict the consequences of actions, and transfer this understanding to new situations, it will remain a very powerful linguistic tool, but not a universal intelligence. That's why attention is shifting from linguistics to architectures that learn from video, movement, and interaction with the environment.

What World Models Lead To

The text presents two illustrative directions. The first is JEPA, Yann LeCun's architecture, where the model learns to predict not the next word, but the state of the world. The idea is for AI, like a child, to observe what's happening and gradually assemble intuitive physics: what falls, what collides, what changes after an action.

The second is Genie from DeepMind, which can turn a single image into an interactive 3D scene. This is already a step from describing the world to its internal simulation. If such approaches begin to combine with agent systems and robotics, the model will gain not just memory and dialogue, but a cycle of "perception — prediction — action — verification of results."

According to the author, the effect of such an "awakening" could manifest within a five-to-ten-year horizon. This is not about a magical leap, but about a moment when AI starts to plan not phrases, but real interventions in the environment.

  • JEPA shifts learning from words to states and events
  • Genie shows how to assemble an interactive world from a single image
  • Robotic chips like Nvidia Rubin give AI a path to a "body"
  • The combination of simulation and agent makes learning through action possible

The Risk of Awakening

Here the author draws a parallel with Vasily Golovachev's science fiction about a "sleeping genie": while the superintelligence sleeps, its impulses already change reality, but the real risk begins at the moment of awakening. Applied to AGI, this means a transition from strange chat responses to independent planning in the material world — from logistics and energy to robots that can act without constant human prompting.

"To it we may be nothing but biological noise."

This formulation captures the text's main fear: a super-efficient system doesn't need to be evil to become dangerous. It's enough for it to optimize a task in logic that humans can no longer fully trace. What today looks like incoherent "delirium" from a model can, in this view, be interpreted as early, imperfect attempts to construct an internal picture of the world. The author does not claim that such a scenario is inevitable, but warns: overconfidence in the AGI story could turn out to be the most expensive mistake.

What This Means

The text is important not as a prediction of when AGI will appear, but as a shift in the frame of discussion. The question is no longer just how convincingly AI writes, but when it will start to understand the environment, predict its dynamics, and act in it autonomously. If the center of gravity truly shifts toward world models, then the main discussions of the coming years will not be about chatbots, but about agency, robotics, and control over systems that learn from the world itself.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…