MTS showed how OpenClaw was connected to a robot and brought an AI agent into the physical world
MWS tested OpenClaw not only in browsers and applications, but also on real hardware. The team connected an agent to a Unitree G1 robot through a software…
AI-processed from Habr AI; edited by Hamidun News
OpenClaw has demonstrated that an autonomous agent can be moved from the browser to the physical world much faster than commonly believed. The MWS team took an open-source orchestrator that typically manages a computer and applications, and connected it to a robot through a simple software layer. As a result, the agent began not only to perform digital tasks but also to issue commands to a real device, orienting itself by video stream and multimodal models.
The idea is not to replace full-fledged robotics, but to quickly and cheaply assemble a working Physical AI prototype. OpenClaw itself became a notable project in November 2025 when developer Peter Steinberger presented an orchestration layer for autonomous computer operation. The agent receives an instruction in a messenger, breaks it down into steps, switches between applications, maintains context, and when necessary uses any compatible LLM.
This independence from a specific model makes the system convenient for experiments: you can change the reasoning engine without rewriting the automation layer. For developers, this is a rare combination: low entry threshold, model flexibility, and already ready logic for executing multi-component tasks. Such an approach has an obvious risk: if you run OpenClaw directly on a personal laptop, the agent effectively gains broad access to the entire system.
MWS proposed a safer option — running it in an isolated virtual machine in the cloud. For this, they use a ready-made Ubuntu image with preconfigured OpenClaw, and for a basic scenario, a configuration with 2 vCPU and 8 GB RAM is sufficient. What remains is to issue a service account, set up an API key, and connect an LLM through an OpenAI-compatible endpoint via GPT Model Hub.
In other words, instead of manually setting up the environment, a developer gets an almost one-click start. This is important not just for convenience: in a separate environment it's safer to test scenarios where the agent can open processes, change system state, and maintain context of work sessions. The most interesting moment started where articles about agent interfaces usually end: at the hardware level.
MWS took a humanoid Unitree G1, which can walk, maintain balance, and react to the environment, but is not itself an "intelligent" agent. Instead of a complex VLA architecture, the team simply intercepted the remote control logic: OpenClaw sends commands to an intermediate layer, which converts them into signals the robot understands. Using the same scheme, you can connect not only an expensive humanoid robot, but any device with an API or radio remote — from a cart to a toy dog.
The key idea here is that the intellectual layer is separated from the execution mechanism, which means the same agent can be transferred between different types of devices. So the agent doesn't act blindly, a video stream from the robot's camera was fed into OpenClaw, and the scene interpretation was entrusted to a multimodal model kimi-2.5.
It recognizes objects, assesses the situation, and helps choose the next action: move forward, stop, avoid an obstacle, or execute a simple command in space. Importantly, the demonstration didn't require MCP servers, heavyweight reasoning chains, or a separate robotics platform. Essentially, the team assembled a minimal bridge between an LLM agent, vision, and an execution device, showing that the entry threshold into Physical AI is now noticeably lower than many expected.
And that's exactly the value of this case: it shows not an academic model of the future, but an engineering recipe that can be replicated with available components right now. The practical conclusion is simple: OpenClaw can already be used not only for emails, files, and web interfaces, but also as a universal control layer for physical devices. This is not yet a replacement for full Vision-Language-Action systems and not a path to reliable industrial autonomy, but a very fast way to test a scenario, assemble a demo, or launch an applied prototype.
For the market, this is an important signal: the combination of cloud-based LLM, video stream, and simple control API is gradually turning Physical AI from research exotica into an engineering tool.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.