Habr AI→ original

Robot with Google LLM: how a 270M-parameter model was trained to control movement

An engineer successfully integrated a compact Google language model (270 million parameters) into a tracked robot with a manipulator arm and trained it to opera

Robot with Google LLM: how a 270M-parameter model was trained to control movement
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

An engineer integrated Google's compact open-source language model (270 million parameters) into a tracked robot with a manipulator and trained the model to control its movements exclusively in simulation. The experiment demonstrates that compact LLMs are capable of learning to control complex physical systems without enormous computational resources.

Why a compact LLM

Google released a series of compact Gemini Nano models designed for devices with limited computing resources. A 270-million-parameter model is not a monstrous GPT-4, but a neat, elegant tool that can run directly on embedded robot systems without calling cloud servers. This architecture provides several critical advantages for robotics.

First, the model operates locally and doesn't require an internet connection — the robot is completely autonomous. Second, it responds without network latency delays, which is critical when milliseconds determine the outcome of an operation. Third, power consumption is low — the robot's battery lasts longer, and electronics don't overheat.

The author chose this model precisely because its performance is entirely sufficient for decision-making in physical system control. Compact models learn faster than giant LLMs and require less data for training.

Training in a virtual environment

The entire experiment was conducted in simulation — a virtual environment where physics and dynamics behavior corresponds to reality. The tracked robot with manipulator moved not in a real room, but in a computer model. This allowed the author to rapidly test thousands of behavior variants without risking damage to expensive real equipment. The language model received information about the robot's state — track position, rotation angle, manipulator coordinates, images from the virtual camera — and trained itself to independently make decisions about how to act. The model trained itself to perform the following skills:

  • Track control: when to activate, in which direction, and at what speed
  • Navigation and orientation: how to turn and orient itself in space
  • Manipulation: how the manipulator should approach, grasp, and move objects
  • Coordination: how to coordinate body and arm movement for complex tasks

The model learned through trial and error: attempted an action, saw the result in simulation, corrected its behavior. The process is not instantaneous, but after hundreds of thousands of iterations, the model found efficient control strategies. In the end, it learned to perform deliberate manipulations — grasping objects, moving them, stacking them — exactly as if the robot were physical.

Cyberpunk instead of marketing

The author calls his project "cyberpunk" — an experiment that is simultaneously technically interesting and provocative. The philosophy is simple: if you take an open-source model, load it into a robot, and let it learn in simulation, can it become useful labor? Skeptics usually answer "no" — they say robotics requires special architecture, billions of parameters, and megabytes of data.

The author's answer: no, it can. And it works. It doesn't require enormous parameters, special architectures, or endless data.

A compact model with 270 million parameters is enough to learn, under simulation conditions, to control a non-trivial mechanical system — a tracked robot with a manipulator performing manipulations in three-dimensional space. The paradox is that compact models are often more versatile than specialized tools.

What this means

The experiment blurs the boundary between "pure" language models and physical system control. Tomorrow, compact LLMs may naturally control industrial manipulators, autonomous platforms, and mobile robots right on-site — in workshops, warehouses, agriculture, logistics. Without calling the cloud, without network transmission delays, on open licenses and at affordable prices. This, of course, is when models learn to reliably transfer knowledge from simulation to reality — the so-called sim-to-real transfer. And this has already begun.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…