Habr AI→ original

OpenGrall Presented an Architecture for AI Robots Where a Language Model Handles Strategy

OpenGrall proposes not giving motor control directly to the language model: it makes only high-level decisions, while execution and emergency reflexes are…

AI-processed from Habr AI; edited by Hamidun News
OpenGrall Presented an Architecture for AI Robots Where a Language Model Handles Strategy
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

OpenGrall offers a simple but important shift in robotics: an LLM should not control a robot at the motor and instant-reaction level. Here, the language model is responsible only for meaning, planning, and choosing the next step, while safety, movement, and low-level reflexes are separated into a distinct feedback loop. Through this approach, the project aims to eliminate the main problem of most "GPT robot" demonstrations, where the machine talks beautifully but then freezes for several seconds before each action.

The authors start with the most painful issue — safety. An LLM is non-deterministic: the same request can produce different answers, which means trusting it with direct motor control is dangerous. OpenGrall introduces a hybrid scheme for this, where the role of "spinal cord" is performed by a TinyML model or another rigid execution loop on a microcontroller.

It is precisely this that understands the physics of the specific platform, manages suspension, obstacle handling, and has veto power over any command. If an operator or LLM says to move forward, but a range finder detects an object closer than 10 centimeters, the command simply will not be executed. The declared emergency stop reaction time is less than 10 milliseconds.

The logic here is pragmatic: the LLM thinks, while the execution module acts and can stop dangerous action at any moment. The second major strength of OpenGrall is modularity. The project separates "thinking" and "doing" so that both parts can be changed independently.

The brain role can be performed by a local LLM, VLM, or cloud model if a more complex task like multi-step planning or web search is needed. The executor role — TinyML, VLA model, or even an ordinary hard-coded algorithm if the platform is simple. The entire system is connected via a WebSocket server, and devices connect as regular clients with roles like agent, operator, lidar, or esp.

This allows you to add new sensors without rewriting the core and even build a scheme where one agent works with multiple bodies simultaneously: for example, a wheeled platform and a drone. For an open source project on weak hardware, this emphasis is particularly important: the architecture is not tied to one type of robot or one specific model. The key engineering block is the binding of SensorMemory and WeightCalculator.

Instead of indiscriminately sending all raw streams to the LLM, the system collects data asynchronously, evaluates its freshness and trustworthiness, and then transforms it into a short prompt. If a lidar gets dirty or a VLM goes blind in the sun, their weight decreases before the decision-making moment. If some sensor is slow, it does not block the others.

The article provides an illustrative example: a lidar point cloud is rolled up into eight sectors, and nearby objects are described through angle, distance, size, and speed. For the LLM, this is no longer noise but a structured situation. An important nuance is that OpenGrall does not attempt to manually write complex data fusion rules.

The LLM itself acts as the arbitrator, seeing the source, age, and weight of each signal and choosing an action in JSON format on this basis. At the same time, the system prompt is embedded in the model in advance, for example through Ollama, so in the working cycle only the "bare" operational part goes into the request. According to the author's estimate, this reduces the prompt from approximately 450 to 150 tokens.

A separate focus of the article is fighting "sluggishness". In many classical frameworks, the robot waits for the slowest sensor, and because of this, fast telemetry effectively sits idle. OpenGrall rejects such synchronization: the lidar, VLM, and odometry write data to memory independently, and the agent takes the freshest and most reliable values at the current moment.

But even after that, the LLM still thinks for hundreds of milliseconds, so on the ESP32 side, inertial motion has been added: if there is no new command yet, the robot does not freeze still but smoothly continues its last safe action with velocity damping. Another optimization layer is caching decisions by context hash. If the robot again faces the same empty corridor, the system does not call the model again but takes the already verified solution from the cache.

Next, this idea develops toward familiar high-level habits and reflexes: successful strategies can be executed already without the LLM, and human feedback strengthens their weight. In addition to this, the project stores episodic memory of human instructions and even allows autonomous goal-setting, when the robot on its own chooses what to explore, what to remember, or with whom to start a dialogue in idle time. Looking broader, OpenGrall is interesting not as yet another attempt to "bolt GPT onto a robot," but as an attempt to bring LLM robotics to a more mature architecture.

There is no promise of a magical universal brain, but there is a clear division of responsibility, work with limited hardware, protection from dangerous actions, and a path toward gradual learning without total retraining of the entire system. For developers, this means a more realistic way to build robots based on modern models: use the LLM where it is strong, and do not force it to do what is better suited to a small, fast, and predictable execution loop.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…