MIT Technology Review→ original

World Models: How AI Learns to Understand Reality Instead of Text

MIT held a discussion on world models — a new direction in AI. Companies are developing systems that see and understand the physical world like humans do…

AI-processed from MIT Technology Review; edited by Hamidun News
World Models: How AI Learns to Understand Reality Instead of Text
Source: MIT Technology Review. Collage: Hamidun News.
◐ Listen to article

At MIT Technology Review's May conference, world models were discussed — something that could become the next big leap in artificial intelligence development. Editor-in-Chief Mat Honan and Senior AI Editor Will Douglas Heaven explored how companies are trying to teach neural networks not just to process text, but to truly understand the surrounding reality.

What Are World Models

A world model is not another version of an LLM. It's a fundamentally different system that can watch videos, analyze images, interact with the environment, and predict the consequences of actions. Like a person who sees a cube on the edge of a table and understands it will fall. A neural network must acquire this understanding without explicit instructions, learning it from observing the physical world. These models change the learning paradigm. Instead of the classical scheme "here's text, answer the question," a new one emerges: "watch the video, predict what happens next." This requires a completely different architecture, a different set of data, a different way to evaluate the model's errors.

Why Text Is Clearly Not Enough

Modern large language models are champions of information processing, but they are blind in a literal sense. They know about gravity only because people have written about it millions of times on the internet. But they have never seen a falling object, felt inertia, or experimented with physics. This creates specific blind spots:

  • Cannot predict physical interactions from first principles
  • Get confused about spatial relationships between objects in videos
  • Unable to understand cause-and-effect relationships in sequences of frames
  • Cannot plan actions based on real physics
  • Make errors in predicting trajectories and collisions

This limitation is particularly noticeable when AI tries to control a robot, plan logistics, or predict the consequences of manipulations in reality.

Who Is Working on World Models

OpenAI, DeepMind, Tesla, and other major companies are actively investing resources in developing world models. Approaches vary. OpenAI and DeepMind work with video datasets from YouTube and synthetic simulations. Tesla uses millions of hours of video from its cars' cameras to teach the system to see the world the way people see it on the road. Some companies start with supervised learning on labeled videos. Others use reinforcement learning in controlled simulations, where the model can make mistakes a million times without real consequences, gradually improving its understanding.

What Does This Mean

If companies can scale world models as successfully as they scaled LLMs, AI moves to a new level. From symbolic information processing to something closer to genuine understanding of physical reality. Robotics will move out of laboratories. Autonomous systems will become more reliable. Complex process planning will accelerate. But this is still the early days. MIT Technology Review is drawing attention to this because world models are probably the most important direction in AI for the coming years. Companies that are first to teach neural networks to see and understand the world will gain a huge competitive advantage.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…