AI-лаборатории платят XDOF за сбор обучающих данных для роботов — это грязная рутинная работа

Чтобы роботы научились двигаться и работать с объектами, нужны миллионы часов реальных демонстраций — так же, как LLM учились на интернете. Собирать эти…

Hamidun News Editorial

AI monitoring · TechCrunch

Jun 28, 2026· 3 min

AI-processed from TechCrunch; edited by Hamidun News

AI-лаборатории платят XDOF за сбор обучающих данных для роботов — это грязная рутинная работа — Source: TechCrunch. Collage: Hamidun News.

◐ Listen to article

Physical AI faces the same problem that would have stopped language models without the accumulated internet of humanity: a lack of training data. For now, laboratories are solving it manually — and paying real money to companies like XDOF for it.

Why there's not enough data

Large language models learned to think on trillions of words that humanity has accumulated online. Physical robots don't have such a resource: movements, grasping, balancing, handling fragile objects — all of this needs to be demonstrated live, again and again, in dozens of different scenarios. One hour of quality demonstrations requires considerable effort: an operator puts on an exoskeleton or controls a robot via joystick, performing the same movement hundreds of times under different lighting, with different objects, in different poses.

Data becomes expensive, slow, and rigidly tied to physical space. The internet won't help here. That's why leading teams in physical AI — from Physical Intelligence to labs inside Google DeepMind and humanoid robot developers — have hit one wall: models can be improved endlessly, but without sufficient volume of quality demonstrations, they won't learn.

What XDOF does

XDOF is one of the companies that AI laboratories bring in to outsource this work. They organize the entire process: hire operators, set up equipment, monitor markup quality, and scale production to fit the needs of specific clients. The model resembles the early days of Scale AI, which hired an army of annotators to mark up text and images. Only now it's about physical labor in real space.

A typical data collection session looks like this:

An operator controls the robot manually — the system captures movement trajectories and force data
Several cameras simultaneously capture the scene from different angles
Each attempt is marked: success, failure, edge case
The scenario is repeated under different lighting, with different objects, and in different poses
Final verification filters out defective demonstrations

This work doesn't require engineering education, but it does require attention, patience, and physical endurance — this is that very "dirty, unseemly work" the industry warns about.

Who pays and why it matters

Scaling data collection runs up against physics: you can't download millions of robotic movements from the network, you can't replace them with synthetic data without risking model degradation. Outsourcing allows laboratories to focus on architecture and training while specialists handle the routine. Along with this, a new type of "hidden labor" emerges in the AI industry — invisible to the general public but critically important. By analogy with content moderation for LLMs, the market for collecting robotic demonstrations will grow quickly — and will just as quickly become the center of discussions about working conditions and quality standards.

What this means

The era of physical AI will require the same data collection infrastructure that the internet created for language models. Companies that first build efficient pipelines for collecting and marking physical demonstrations will gain a structural advantage — regardless of who develops the models themselves.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →