MarkTechPost→ original

Google DeepMind Presents Gemini Robotics-ER 1.6 for Robot Autonomy and Instrument Reading

Google DeepMind updated Gemini Robotics-ER to version 1.6 — a cognitive layer for robots that better understands space, determines task completion, and can…

AI-processed from MarkTechPost; edited by Hamidun News
Google DeepMind Presents Gemini Robotics-ER 1.6 for Robot Autonomy and Instrument Reading
Source: MarkTechPost. Collage: Hamidun News.
◐ Listen to article

On April 14, 2026, Google DeepMind unveiled Gemini Robotics-ER 1.6 — an update to its reasoning model that serves as the top cognitive layer for robots in the physical world. The key idea of version 1.

6 is not to add another VLA, but to give the robot more precise spatial reasoning: the model better understands the scene, counts objects, determines whether a task has been completed, and for the first time confidently reads complex instruments like pressure gauges, level indicators, and digital displays. At DeepMind, Gemini Robotics-ER is called a reasoning-first model for embodied AI. It is needed where robots need more than just object recognition: they need to understand relationships between objects, select a grasping point, check constraints, and decide what to do next.

In version 1.6, they notably strengthened pointing — the ability to indicate objects and use these points as an intermediate step in reasoning. This helps count objects more accurately, compare sizes, construct trajectories, and follow instructions with spatial conditions.

The model also received improved multi-view understanding: it better assembles a unified picture from multiple cameras, such as one mounted overhead and another on the manipulator. For developers, Gemini Robotics-ER 1.6 is already available through the Gemini API and Google AI Studio, along with Colab examples for configuration and prompt engineering.

The most notable new ability is instrument reading, which emerged from collaboration with Boston Dynamics. In industrial environments, robots regularly encounter thermometers, circular pressure gauges, sight glasses, and vertical level indicators, where the task is not just image classification but precise value extraction. For this, Gemini Robotics-ER 1.

6 uses agentic vision — a combination of visual reasoning and code execution. The model first zooms in on the relevant fragment, then marks key points, evaluates intervals and proportions, and then matches this against the scale, units of measurement, and context. According to Google DeepMind data, in the instrument reading task, version 1.

5 scored 23%, Gemini 3.0 Flash — 67%, Robotics-ER 1.6 itself — 86%, and with agentic vision — 93%.

This is no longer a demonstration that the robot "sees" the instrument, but a step toward a scenario where it conducts rounds on its own, reads measurements, and understands what they mean. Another important component is determining action success and safety. For an autonomous robot, it is not enough to start a task; it must understand when task completion has truly been achieved and when the attempt needs to be repeated.

DeepMind reports that the model handles success detection better even in dynamic scenarios, with partial occlusions and ambiguous angles. In parallel, they improved compliance with physical constraints: for example, the system should more correctly account for restrictions like "do not grasp liquids" or "do not lift objects heavier than 20 kg." In tests for recognizing dangerous situations from text and video, the Gemini Robotics-ER family improved results relative to Gemini 3.

0 Flash by 6% and 10% respectively. At the same time, Google separately notes a limitation: the model is not intended for safety-critical applications such as medicine, transportation, and other environments where an error could lead to injury or damage. The practical significance of the release is that Google is gradually transforming embodied reasoning from a research topic into an infrastructure layer for robotics.

Gemini Robotics-ER 1.6 does not directly control hardware, but gives robots a more powerful top-level reasoning capability that can be integrated with VLA models, search, and external functions. For the industry, this is also a signal that the interface between language models and robots is approaching commercial application.

If this combination proves itself outside the laboratory, the market will get robots that not only move according to a script, but also interpret the environment, verify the result, and read real instruments without a human in the loop.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…