SIMA 2 from DeepMind: The First Thinking Agent for Video Games and Robotics
DeepMind has presented SIMA 2 — an agent that has evolved from an obedient executor to an interactive companion. The agent now not only follows commands but…
AI-processed from DeepMind Blog; edited by Hamidun News
DeepMind has presented SIMA 2 — an agent for virtual 3D worlds that has evolved from simple instruction-following to an interactive assistant capable of reasoning, conversing, and self-improving. This is significant progress toward a new level of artificial intelligence.
From Obedience to Thinking
A year ago, DeepMind launched the first SIMA — an agent that could perform over 600 skills in video games: "turn left," "climb the stairs," "open the map." The agent acted like a human — viewing the screen and controlling a virtual keyboard and mouse without access to the internal mechanics of the games.
SIMA 2 represents a qualitative leap in architecture. At its core now lies the Gemini model, which gives the agent genuine reasoning capabilities. This means that instead of simply executing the command "find the campfire," the agent can now understand high-level goals, break them down into sub-tasks, analyze the environment, and plan actions.
What SIMA 2 Can Do
The agent was trained on two types of data: video recordings of real human actions with detailed annotations and labels automatically generated by Gemini itself. This hybrid approach enabled SIMA 2 to develop new capabilities:
- Break down complex user goals into logical sub-steps and execute them in the correct sequence
- Explain its intentions and reason about every action
- Respond to clarifying questions from users and engage in dialogue
- Learn from its mistakes and improve with every attempt
- Transfer skills to entirely new games the agent has never encountered
In demonstrations, SIMA 2 successfully found the campfire in unfamiliar games where the first version would simply freeze. The agent generalizes abstract understanding of the task rather than mechanically repeating learned commands.
On the Path to Physical Robots
DeepMind emphasizes that this research extends far beyond video games. The SIMA 2 architecture — screen perception, goal reasoning, action execution through interface control — is precisely what's needed for developing real robots. In the physical world, a robot would use a camera instead of a screen, but the task remains the same: understand the environment, plan actions, interact with objects.
The first SIMA already demonstrated transfer from video games to real-world simulators. SIMA 2, with its reasoning capabilities, should become an even more universal tool for robotics.
The developers call this a significant step toward AGI — Artificial General Intelligence. The problem of generalization (applying learned knowledge to entirely new situations) has long been a stumbling block in AI. SIMA 2 demonstrates concrete progress in solving it: the agent can adapt to unfamiliar environments and goals.
What This Means
The boundary between narrow, task-oriented AI and general reasoning is blurring. SIMA 2 is not merely a command executor but an interactive assistant that understands context, can discuss strategy, and learns on the fly. For robotics, this means that key technologies — vision, reasoning, adaptation — are already close to practical application.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.