DeepMind Blog→ original

Google DeepMind Introduces Gemini Robotics-ER 1.6 for Autonomous Real-World Tasks

Google DeepMind upgraded Gemini Robotics-ER to version 1.6 and focused on real-world scenarios: from object recognition and task verification to sensor and…

AI-processed from DeepMind Blog; edited by Hamidun News
Google DeepMind Introduces Gemini Robotics-ER 1.6 for Autonomous Real-World Tasks
Source: DeepMind Blog. Collage: Hamidun News.
◐ Listen to article

Google DeepMind presented Gemini Robotics-ER 1.6 on April 14, 2026 — an updated reasoning model for robots designed not merely to execute commands, but to understand the physical environment. The company is betting on embodied reasoning, namely the ability of the system to link visual perception, task context, and real-world action.

The new version emphasizes more precise spatial reasoning, multi-camera scene understanding, task completion determination, and industrial instrument reading. Essentially, it is a high-level "brain" for the robot that can invoke external tools, VLA models, and custom functions to execute complex real-world scenarios. One of the key improvements is work with spatial tasks.

DeepMind explains that for a robot, a basic operation like pointing at an object is not a trifle but a foundation for more complex behavior. Through points, the model can not only find objects but also count them, compare sizes, build relationships between objects, select optimal grasp points, and verify constraints from the prompt. For example, if the system needs to show all objects that will fit in a blue cup, it must simultaneously recognize the shape, size, and relative position of the items.

In demonstrations, Gemini Robotics-ER 1.6 more accurately determined the number of tools in the frame, did not point to missing objects, and overall performed significantly better at such tasks than Gemini Robotics-ER 1.5 and Gemini 3.

0 Flash. The second important component is multi-view scene understanding and so-called success detection, that is, the ability to determine whether a task has actually been completed. For autonomous robotics, this is critical: it is not enough for a robot to start an action; it must understand whether a retry is needed or it can proceed to the next step of the plan.

In real-world setups, this is especially challenging because cameras often look at the scene from above and from the manipulator simultaneously, some objects may be occluded, and lighting and background change. Gemini Robotics-ER 1.6 better aligns multiple video streams and assembles a coherent picture from them.

As an example, DeepMind shows a scenario where the system, based on several views, determines whether the task "place the blue pen in the black pen holder" is complete. The most practical innovation is instrument reading. DeepMind developed this capability together with Boston Dynamics, drawing on tasks for industrial facility inspection.

At factories and technical facilities, robots need to regularly check thermometers, pressure gauges, chemical level gauges, sight glasses, and digital displays. For this, it is not enough to recognize an image: the system must understand the position of the arrow, fluid level, scale boundaries, divisions, measurement unit labels, and sometimes even align readings from multiple arrows corresponding to different digits. If it concerns a sight glass, one must also account for distortions due to camera angle.

According to DeepMind, instrument reading accuracy rose from 23% in Gemini Robotics-ER 1.5 and 67% in Gemini 3.0 Flash to 86% in Gemini Robotics-ER 1.

6. With agentic vision mode enabled, the figure reaches 93%: the model first zooms in on the desired area, then uses pointing to key points and code execution to assess proportions and intervals, after which it interprets the final value. DeepMind separately emphasizes safety.

The company calls Gemini Robotics-ER 1.6 its safest model for robotics at present. It better adheres to Gemini policies in adversarial spatial tasks and significantly more accurately respects physical constraints — for example, when the system cannot work with liquids or lift objects heavier than 20 kilograms.

Moreover, in scenarios based on real injury reports, Robotics-ER shows an advantage over Gemini 3.0 Flash: plus 6 percentage points in text tasks and plus 10 points in video tasks related to risk recognition. For developers, the model is already available through the Gemini API and Google AI Studio, and alongside the release, DeepMind published a Colab example and invited partners to send annotated images of typical errors for improving future versions.

This update shows where competition in robotics is shifting: mechanics alone is deciding less and the reasoning layer above it is deciding more. If a model can see a scene from multiple viewpoints, use tools, read instruments, verify results, and at the same time account for safety constraints, a robot becomes not just an execution device but a system capable of working situationally. For industrial inspections, warehouses, and service scenarios, this is one of the most practical signals that large AI models are moving closer to real autonomy outside the laboratory.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…