Micro1 hires people globally to train humanoids as AI market demands new benchmarks
As the humanoid market accelerates, Micro1, Scale AI, and DoorDash are already paying people to film ordinary household chores from ironing to dishwashing…
AI-processed from MIT Technology Review; edited by Hamidun News
The market for embodied AI is running into constraints beyond just hardware, but also human data: thousands of people around the world are filming themselves doing laundry, ironing, and cleaning their homes to train humanoids. Against this backdrop, researchers are increasingly saying that traditional AI tests reveal little about how such systems will actually perform in real work.
How humanoids are trained
One of the most prominent companies in this new market layer is Micro1. It hires contractors in more than 50 countries, including Nigeria, India, and Argentina, to record everyday actions from a first-person perspective: folding clothes, washing dishes, wiping tables, pouring water, opening refrigerators. To do this, people mount an iPhone on their head and film short videos with their hands in frame. The videos then go through verification, annotation, and end up in datasets that robotic companies purchase.
The logic is straightforward: large language models had the internet, but humanoid robots need the real physical world. Simulations help work through movements, but poorly capture the chaos of a typical apartment: different lighting, cramped kitchens, slippery surfaces, dozens of object types and ways to interact with them. This is why not only Micro1 and Scale AI, which has collected over 100,000 hours of such material, but also new channels like DoorDash Tasks are emerging in the market.
On March 19, 2026, DoorDash officially launched a pilot where workers are paid for filming everyday actions and voice recordings for AI and robotics. Even hundreds of thousands of hours of video don't yet seem to be the market's limit.
"This will take longer than many people think," — roboticist
Ken Goldberg.
The cost of such data
For many contractors, this is decent side work: rates of around $15 an hour in several countries look competitive. But the work quickly becomes monotonous. Participants need to repeatedly film similar actions, come up with new scenarios within a small apartment, and ensure the recording matches instructions. One worker in Delhi described spending nearly an hour on a 15-minute usable video — simply because there aren't that many different tasks to film in his home.
- Mount an iPhone or other compatible smartphone at head level
- Record videos from a first-person perspective, usually 1–2 minutes each
- Hands and the object must remain in frame almost the entire time
- Variations in lighting, rooms, surfaces, and objects are needed
- Faces, names, and other personal data are avoided when possible
The main question here isn't really routine, but privacy. Even if a face doesn't appear in frame, the video contains interior design, kitchen medications, children's belongings, daily routines, and neighbors accidentally caught in the background. Meanwhile, the contractors themselves often don't know exactly who their recordings are sold to, how long they're stored, or whether they can request deletion.
Researcher Yasmin Kotturi directly states that companies should explain in advance to people where such technology might end up and how it will affect them in the future.
Why tests break
In parallel with the data race, another dispute intensifies: how do we measure AI quality at all. Researcher Angela Aristidou believes the industry has lived too long in the logic of a school exam, where a model is compared to a human on an isolated task with a right or wrong answer. In real life, this rarely happens.
AI is embedded in teams, regulations, and long processes where it matters not only how accurate and fast the system is, but also how it affects people's coordination, workload, trust, and error rates in subsequent steps.
Instead, Aristidou proposes the HAIC approach — Human–AI, Context-Specific Evaluation. The idea is to test not a model in a vacuum, but how the system works inside an organization over a long horizon.
In her examples, medical AIs could look good on tests but slow down work in hospitals because doctors had to adjust their conclusions to local reporting standards and regulatory requirements.
This approach shifts focus along several lines:
- from individual task to team work and workflow
- from a one-time test to long-term effect
- from raw accuracy to coordination quality and error detection
- from a single answer to consequences before and after its use
For business, this is an uncomfortable but useful thought. A high benchmark score doesn't yet mean the tool will speed up a hospital, warehouse, support service, or humanitarian organization.
In one case Aristidou describes, a system was evaluated for 18 months within real processes, with separate tracking of how easily people noticed and corrected its errors. Only such testing revealed what safeguards were needed before large-scale deployment.
What this means
In both the story about home trainers for robots and the debate over new benchmarks, there is one common conclusion: the AI industry increasingly relies not on flashy demos but on the quality of hidden infrastructure. Winners will not only be those with the most impressive robots or highest test scores, but those who can ethically gather real data, clearly explain the rules for working with it, and prove the system's value within actual processes, not just on stage.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.