AWS and NVIDIA launched large-scale training for the Unitree H1 robot on SageMaker AI
AWS and NVIDIA showed how to scale reinforcement learning for the Unitree H1 humanoid robot in the cloud. The NVIDIA Isaac Lab simulator now runs on top of…
AI-processed from AWS Machine Learning Blog; edited by Hamidun News
AWS and NVIDIA launched large-scale training of the Unitree H1 humanoid robot on SageMaker AI
AWS together with NVIDIA demonstrated a complete pipeline for training control policies for the Unitree H1 humanoid robot directly in the cloud — without owning a GPU cluster.
Why the cloud for robots
Reinforcement learning for physical robots requires billions of simulation steps — this is no exaggeration. For a humanoid to learn to walk forward without falling, the neural network must go through tens of billions of interactions between a virtual agent and the environment. Doing this in the real world is expensive and dangerous: one failed experiment means potential repairs costing thousands of dollars, and the process itself would take years instead of hours.
This is why the industry is betting on physical simulation. The race for a "Moore's Law for robots" has already begun: Tesla, Figure, Boston Dynamics, and dozens of startups are investing hundreds of millions in creating synthetic environments for training. NVIDIA Isaac Lab is a GPU-accelerated simulator capable of running thousands of copies of a virtual environment on a single node simultaneously.
Previously, it was used primarily in large corporate and university laboratories with expensive hardware. Now Isaac Lab is directly integrated with Amazon SageMaker AI. This means that a request for hundreds of GPUs is fulfilled in minutes, and an engineer doesn't need to think about infrastructure — only about policy code and task configuration.
Two deployment options
AWS offers two modes for different use cases:
- SageMaker HyperPod — a persistent managed cluster; infrastructure persists between runs, which is convenient for multi-week research and iterative hyperparameter tuning
- SageMaker Training Jobs — a one-time managed run; resources are allocated strictly for the task and automatically released upon completion, which simplifies budget control
- p4d and p5 instance series with NVIDIA A100 and H100 respectively are supported
- Isaac Lab is deployed in a standard Docker container; model weights and checkpoints are automatically saved to Amazon S3
- Training metrics — reward, episode length, entropy loss — are streamed to Amazon CloudWatch in real time
The key advantage of both options is the removal of operational burden. There is no need to manually configure Kubernetes, manage InfiniBand networking between nodes, or manually balance GPU workloads.
How Unitree H1 training works
Unitree H1 is one of the most accessible mass-produced humanoids: approximately 180 cm tall, weighing 47 kg, with 19 degrees of freedom. This makes it a popular platform for academic research in motion control. In Isaac Lab simulation, thousands of virtual copies of this robot learn to walk in parallel using the Proximal Policy Optimization (PPO) algorithm: they fall, stand up, adjust balance, and receive rewards for stable forward movement.
How accurately the reward function describes the desired behavior determines the quality of the trained policy. On a single H100 node, Isaac Lab can run up to 4096 parallel simulations simultaneously. When scaling to multiple nodes, distributed training is employed via PyTorch DDP — gradient synchronization between GPUs happens automatically.
"Scaling to hundreds of GPUs through
SageMaker reduces training time from several days to several hours," note the authors of the AWS blog post.
Upon completion, the trained policy is exported in ONNX or TorchScript formats and can be deployed on real hardware via NVIDIA Isaac ROS.
What this means
Cloud-based reinforcement learning for robots is moving beyond laboratories with multimillion-dollar equipment budgets. Any small team with an AWS account can now run a serious humanoid training experiment without major infrastructure investments. This changes the economics of robotics: the barrier to entry lowers, the pace of iteration increases — and the next breakthroughs in physical robot control may well come from unexpectedly small teams.
Need AI working inside your business — not just in your newsfeed?
I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.