AMI Labs Bets on World Models Beyond LLM and Sees Path to Products Through VLA
AMI Labs, a Yann LeCun project, advances world models as the next step after LLM: instead of predicting tokens—understanding environment and action…
AI-processed from Habr AI; edited by Hamidun News
After the LLM boom, AMI Labs proposes shifting the center of gravity of AI from language to understanding the physical environment: a machine cannot just continue text if it must act safely in the real world, plan steps, and assess the consequences of its decisions in advance. AMI Labs is a research company founded by Yann LeCun, one of the key pioneers of deep learning. The project attracted $1.
03 billion at a $3.5 billion pre-investment valuation, demonstrating that interest in world models has moved beyond academic discussion. The company operates from a simple premise: data from cameras, sensors, and instruments are structured differently from text.
They are continuous, noisy, multidimensional, and poorly suited to the logic of "predict the next token." Instead of adapting LLMs to any scenario, AMI relies on a different foundational layer—a world model. Here, a world model is not a video generator nor simply a multimodal system with images, text, and actions as input.
It refers to a model that builds a hidden internal representation of the environment, identifies stable relationships, and discards random details. What matters is not every pixel of the future frame, but the structure of what is happening: where objects are located, how they move, what constraints the environment has, and what will change after the agent acts. Such an architecture should answer not only "what do I see" but also "what will happen if I do this."
This is precisely why JEPA—Joint Embedding Predictive Architecture—becomes central to the approach. In this logic, the model predicts not raw data or a sequence of tokens, but state in a representation space. This allows the system to avoid spending computation on noise and random variations, and instead learn from truly meaningful features of the scene.
A practical argument for this approach already appeared in V-JEPA 2 research: the system was first pretrained on more than a million hours of internet video, and then fine-tuned with an action-conditioned version on fewer than 62 hours of unlabeled robotic video. After this, the model in zero-shot mode was able to work with Franka manipulators in new laboratories, performing grasping and object displacement without collecting data specifically for that environment and without a reward function. But the world model itself is not yet a complete agent.
It can predict how situations will unfold, but someone must translate this understanding into specific action. This is where VLA, visual-language-action layer, appears, connecting perception, user intent, language command, and the system's allowable actions. An important thesis of AMI and related work is that VLA and world models do not compete.
Rather, without internal prediction, VLA remains too reactive: it can output the correct action "for now," but struggles with long, fragile, and physically sensitive scenarios where you need to mentally simulate the consequences of touch, movement, collision, or error. This is why the most obvious markets for this approach are not chat interfaces, but industries with high cost of failure: industrial automation, robotics, wearables, and healthcare. If a text model makes an error in summarizing an article, the damage is limited.
If an intelligent system misinterprets equipment state, incorrectly assesses risks in medicine, or miscalculates a robot's trajectory, the consequences are already physical. Notably, AMI's first partner is named Nabla from digital medicine. This does not mean the company has already solved the task of reliable AI for clinical environments, but it shows the direction: less focus on flashy demos and more focus on controllability, predictability, and internal simulation of the environment before taking action.
The main conclusion is that after the LLM era, the conversation about AI is gradually shifting from linguistic description of the world to its internal modeling. AMI's approach still remains a research program rather than a ready replacement for large language models: the term "world model" is already blurring, and transfer to new environments is yet to be proven. But if this line works, the next practical breakthrough in AI may not come from another chatbot, but from systems that first understand physical reality and only then act within it.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.