Models

World Model

A world model is an internal representation that an AI system learns of its environment's dynamics, enabling it to predict the consequences of actions and simulate future states without directly interacting with the real world.

A world model is a learned, compact representation of an environment's transition dynamics — how states evolve in response to actions, what observations are likely at each state, and what rewards result. Rather than mapping observations directly to actions (a reactive policy), an agent with a world model can mentally simulate hypothetical futures: imagining what would happen under action A versus action B and choosing based on the simulated outcomes. The concept originates in cognitive science, where the ability to mentally simulate the environment is considered central to human planning and causal reasoning.

World models are typically implemented as neural networks trained to predict future latent states — or raw observations — given a history of past observations and actions. DreamerV3 (Google DeepMind, 2023) learns a compact latent-space dynamics model in which a policy and value function are jointly optimized entirely within imagined rollouts, substantially reducing the number of real environment interactions required to master a task. In the visual domain, large video generation models — including OpenAI's Sora (2024) and Google DeepMind's Genie (2024) — function as implicit world models: trained to predict plausible future video frames, they encode physical plausibility, object permanence, and scene dynamics as emergent properties. Google DeepMind and others have explicitly framed next-frame video prediction as a tractable path toward general-purpose world models for embodied agents.

World models matter for several reasons. First, they enable sample-efficient learning: an agent that simulates its environment internally needs far fewer costly or dangerous real-world interactions. Second, they support interpretable planning, because an agent can report which simulated future justified its chosen action — a property valuable in safety-critical domains. Third, world models generalize better to novel situations by encoding causal structure rather than stimulus-response mappings, allowing them to extrapolate to state-action combinations not seen during training.

As of 2026, world models are a primary research focus in robotics, autonomous driving, and game-playing AI. In robotics, Physical Intelligence (pi0), Google DeepMind's robotics division, and Figure use world model-style video pretraining to transfer manipulation skills across diverse objects and environments. In autonomous driving, Waymo and Wayve train learned simulation environments that substitute for expensive real-world test miles. The boundary between world models and general-purpose video generation has become productively ambiguous: systems that produce physically consistent video are actively being repurposed as environment simulators for training embodied agents.

Example

An autonomous robotic arm in a warehouse uses a learned world model to mentally simulate twelve candidate grasp trajectories in under 200 milliseconds, selecting the one its model predicts will succeed against an irregularly shaped package, before executing any physical movement.

Related terms

Latest news on this topic

← Glossary