NVIDIA Shows Efficient Method to Train Cosmos for Robot Video Generation Using LoRA

Q: What is the source?

Originally published on Hugging Face Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-05-21. Reading time: 3 min.

NVIDIA engineers published a guide for fine-tuning the Cosmos Predict 2.5 model using LoRA/DoRA—parameter-efficient adaptation methods. This enables adapting vi

Hamidun News Editorial

AI monitoring · Hugging Face Blog

2026-05-21· 2 min

AI-processed from Hugging Face Blog; edited by Hamidun News

NVIDIA Shows Efficient Method to Train Cosmos for Robot Video Generation Using LoRA — Source: Hugging Face Blog. Collage: Hamidun News.

◐ Listen to article

NVIDIA presented a practical guide for fine-tuning its Cosmos Predict 2.5 model using LoRA and DoRA—parameter-efficient adaptation methods. This work transforms expensive full retraining into an accessible process that any team can run on a single GPU.

Why This Matters

Cosmos Predict 2.5 is a powerful 2-billion-parameter video model that generates physically plausible videos from text, images, or other videos. Standard full retraining of such a model requires enormous computational resources and often leads to catastrophic forgetting—the model loses general knowledge when adapting to a specific task.

LoRA (Low-Rank Adaptation) solves this problem: instead of modifying all 2 billion parameters, only small adapters in attention and feedforward layers are trained. This reduces memory consumption by an order of magnitude and enables work on budget hardware.

How It Works in Practice

Using the GR1-100 dataset (92 robot manipulation videos), NVIDIA demonstrated the following results:

Training on 1× H100 GPU: 17 hours
Training on 8× H100 GPU: 2.5 hours
Adapters occupy only a few MB (versus many GB for full checkpoints)
Adapters are easily swappable—different versions for different domains

The model was trained for 500 epochs on manipulation videos: grasping objects from a mat into a bowl, bringing juice to a green cup, and so on. Text instructions for each video helped the model understand what needed to be generated.

What Training Delivered

The base model struggled: generating human hands instead of robot hands, shaky video, and implausible object movement. After fine-tuning via LoRA/DoRA:

Fine-tuned models (LoRA r=32,

DoRA r=32) correctly use specified hand and eliminated jitter and improved video stability.

Qualitatively: hallucinations disappeared, the model consistently uses the correct hand, objects move with physical plausibility, and instructions are followed more precisely.

Quantitatively: scores for geometric stability (Sampson Error), physical plausibility, and instruction-following all improved across all configurations—LoRA rank 8, LoRA rank 32, DoRA rank 32. Rank 32 provides better instruction accuracy, rank 8 requires less memory.

What This Means

Synthetic robot videos are a hot problem: obtaining real manipulation data is expensive and time-consuming. With Cosmos + LoRA, robotics teams can generate thousands of examples overnight on a single GPU. This is cheaper, faster, and trains real robots on diverse movement variations.

NVIDIA released complete code, recipes, and pre-made adapters—copy and run.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation