Hugging Face Blog→ original

Nvidia unveiled the first open dataset and foundation AI models for medical robots

Nvidia and its partners released Open-H-Embodiment, the first large open dataset for medical robotics. It includes 778 hours of data on surgery, ultrasound…

AI-processed from Hugging Face Blog; edited by Hamidun News
Nvidia unveiled the first open dataset and foundation AI models for medical robots
Source: Hugging Face Blog. Collage: Hamidun News.
◐ Listen to article

Nvidia presented its first open dataset and foundational AI models for surgical robotics

Nvidia, together with the research community, has presented Open-H-Embodiment — the first major open dataset for medical robotics, along with two foundational models for surgical scenarios. The package was published on Hugging Face and is meant to shift medical AI from image analysis toward systems that can act in the physical world.

What was opened

The main idea behind the release is straightforward: for medicine, models that only recognize images, segment tissues, or classify pathologies are no longer enough. In the operating room, during ultrasound, or during colonoscopy, a machine needs to work with instruments, understand tissue contact, account for robot kinematics, and close the control loop with feedback. That's exactly what Open-H-Embodiment was assembled for — a common foundation for training and evaluating Physical AI in medical robotics.

  • 778 hours of training data under CC-BY-4.0 license
  • 35 participating organizations from universities, clinics, and industry
  • scenarios from surgery, ultrasound, and autonomous colonoscopy
  • data from simulation, training exercises, and real procedures
  • support for commercial and research robotic platforms

For the market, volume matters, but so does format. The dataset combines vision, force, kinematics, and different types of robotic bodies in a single open collection, so teams can compare approaches on a common foundation rather than on closed local collections. The project includes Nvidia, Johns Hopkins, Technical University of Munich, Stanford, and dozens of other teams, so this is not a one-off publication but an attempt to set an industry standard.

How the models work

Along with the dataset, Nvidia released GR00T-H — a Vision-Language-Action model for surgical robotics, trained on approximately 600 hours of Open-H-Embodiment data. This is essentially a policy model that receives visual and textual context and translates it into robot actions. The authors emphasize that they introduced a common normalized action space for different robots, specialized projections for specific kinematics, and training on relative instrument movements. The prototype has already demonstrated complete suture completion in the SutureBot benchmark, meaning we're talking about not a short gesture but a long sequence of precise actions.

The second part of the stack is Cosmos-H-Surgical-Simulator, a world foundation model for action-conditioned surgical simulation. It was fine-tuned on Open-H-Embodiment so that the model generates realistic surgical video directly from the robot's kinematic actions, including complex effects like soft tissue deformation, glare, blood, and smoke. The practical advantage is noticeable: 600 runs in such a simulator take about 40 minutes versus approximately two days on real test benches. They used 64 A100 GPUs and about 10 thousand GPU-hours for fine-tuning, so this is already a serious infrastructure setup, not a lab demo reel.

What's next

The most interesting part of this story is the attempt to shift medical robotics from "the model sees" mode to "the model acts and generalizes" mode. An open dataset plus two foundational models give researchers a common stack for sim-to-real experiments, synthetic data generation, and skill transfer between different robots. This is especially important for surgery, where collecting large quality datasets is expensive, and a control error costs much more than in ordinary computer vision.

"Surgical robotics needs its own ChatGPT moment."

That's how the authors describe the goal of the second version of Open-H-Embodiment. The next stage is not just better instrument control, but autonomy with elements of reasoning: systems should be able to explain steps, plan long procedures, adapt to failures, and learn from annotated trajectories with indication of intentions, outcomes, and error types. If the community truly assembles such reasoning-ready data, medicine could get not another narrow algorithm, but a platform for more universal robotic assistants.

What it means

For the AI market, this is an important shift: in medicine, they are beginning to openly collect not just recognition models, but a foundational layer for Physical AI, where data, policy models, and simulators are released as a package. If the approach takes off, startups, labs, and robot manufacturers will have a common starter kit for accelerating research, reducing testing costs, and achieving faster transitions from prototypes to clinically useful systems.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…