World Models: Why Video Generators Aren't About Cinema, But About the Physics of Reality
When OpenAI released Sora, everyone rushed to discuss how soon Hollywood would be sent to the dustbin of history. But if you strip away the enthusiasm about…
AI-processed from Jiqizhixin (机器之心); edited by Hamidun News
When OpenAI released Sora, everyone rushed to discuss how soon Hollywood would be sent to the dustbin of history. But if you strip away the enthusiasm about mammoth fur and neon signs of Tokyo, what remains is something far more fundamental. We are witnessing a transition from simple prediction of the next pixel to the creation of full-fledged world models. This is not merely a change in terminology, but a tectonic shift in how machines perceive our reality. For a long time, AI lived in a world of text and static images, but now it is attempting to master the concept of time and cause-and-effect relationships.
Why do we need to model the world at all? Imagine you want to teach a robot to make coffee. Previously, you either had to write thousands of lines of code or force the machine to make mistakes millions of times in reality, breaking cups and flooding the floor with water. A world model allows AI to "play out" these scenarios in its head, using a universal simulator of the world. This is a kind of digital imagination that relies not on fantasy, but on learned laws of physics. The irony is that AI derives these laws itself, simply by watching terabytes of video, without a single formula from Newton's textbook.
The problem is that current models are still prone to "physical hallucinations." You have surely seen videos where people walk through walls or objects disappear without a trace. This happens because neural networks don't yet understand the essence of objects—they are merely masters of probability. However, new research approaches aim to embed understanding of space and time into model architecture through hidden representations. This will allow AI not just to draw frames, but to understand that if a ball rolls toward the edge of a table, it will inevitably fall down rather than turn into a butterfly.
For the industry, this means the end of the era of "black boxes" that simply produce results. We are moving toward systems that can justify their actions through simulation of consequences. Companies like Wayve or Tesla already use early versions of world models for autopilots, but researchers' ambitions go further. They want to create a unified environment where AI can test scientific hypotheses or design new materials, checking their strength in a virtual world identical to ours.
What does this mean for us? In all likelihood, in the next couple of years we will see explosive growth in robotics. Robots will stop being clumsy metal things because they will arrive in our world already "experienced," having lived thousands of virtual lives in simulators. Video generation will remain a nice bonus for content creators, but the real breakthrough will happen where AI begins to predict the behavior of complex systems—from climate change to protein folding. We are finally teaching machines not simply to imitate us, but to understand how the scene is structured, the one on which we all play.
The bottom line: Will AI become a fully-fledged "digital god" or will it remain an advanced video player with hallucinations? The answer lies in whether we can teach it not just to watch, but to understand inertia, friction, and gravity.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.