Nvidia released Nemotron 3 Nano Omni — an open multimodal model for edge agents

Q: What is the source?

Originally published on TNW. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 28, 2026. Reading time: 3 min.

Nvidia presented Nemotron 3 Nano Omni — an open multimodal model for edge agents that combines text, images, audio, video, and documents in a single…

Hamidun News Editorial

AI monitoring · TNW

Apr 28, 2026· 3 min

AI-processed from TNW; edited by Hamidun News

Nvidia released Nemotron 3 Nano Omni — an open multimodal model for edge agents — Source: TNW. Collage: Hamidun News.

◐ Listen to article

Nvidia on April 28, 2026 unveiled Nemotron 3 Nano Omni — an open multimodal model designed for autonomous AI agents on edge devices. This is not just another release for the CUDA ecosystem: the company is demonstrating that it wants to earn money not only on hardware but also on the models themselves.

What the model can do

Nemotron 3 Nano Omni combines understanding of text, images, audio, and video in a single architecture. The model also works with documents, diagrams, and graphical interfaces, and outputs text responses. Essentially, Nvidia is offering not a bundle of several separate models for vision, speech, and documents, but a single unified engine for tasks where an agent needs to simultaneously see the screen, read a file, listen to a voice command, and respond without unnecessary delays between services.

The key idea is that the model is large in total volume but relatively lightweight in operation. Nemotron 3 Nano Omni has 30 billion parameters, but only 3 billion are activated on each inference step thanks to the mixture-of-experts architecture. Nvidia claims this approach delivers up to ninefold throughput gains compared to comparable open multimodal models, and the model leads in six benchmarks for working with documents, video, and audio.

The base text component was trained on 25 trillion tokens and supports a context window of up to 256 thousand tokens. Internally, the model uses a hybrid Mamba-Transformer scheme. According to Nvidia's description, it combines 23 Mamba-2 layers, 23 mixture-of-experts layers, and six grouped-query attention layers.

Each token is routed to only six of 128 experts plus a shared expert, so computations don't balloon. For video, three-dimensional convolutions are applied that account for movement between frames, rather than simply parsing the video as a set of static images. This engineering approach is what should make the model suitable for real-time agents on a single GPU.

30 billion parameters total, 3 billion active on inference
deployment possible on a single GPU, without a server cluster
commercial use permitted under Nvidia Open Model Agreement
model available on Hugging Face and via Nvidia NIM
inputs include text, images, audio, video, documents, and graphical interfaces

Why this for Nvidia

Over the past two years, Nvidia has won primarily as an infrastructure provider: GPUs, networks, CUDA, and all the software around them. But the Nemotron family has already become a separate direction, and now the company is making a bolder move — positioning its own model as a ready-made foundation for industrial AI agents. The logic is simple: if the model is optimized for Nvidia hardware, and the hardware is optimized for Nvidia models, the company gains control over almost the entire stack, like Google, Amazon, or Microsoft in their cloud ecosystems.

This is why the release is presented not as a demonstration of laboratory capabilities but as a product for deployment. Among early users and partners, Nvidia names Foxconn, Palantir, Aible, ASI, Eka Care, and H Company; Dell, DocuSign, Infosys, Oracle, and Zefr are evaluating the model for production. The scenarios are not consumer-focused either: visual inspection on factories, document processing, voice agents, and screen understanding for computer-use systems.

The model can be deployed through Amazon SageMaker JumpStart, OpenRouter, vLLM, SGLang, Ollama, llama.cpp, and TensorRT-LLM — so Nvidia wants to make it available in any familiar stack. Against competitors, the bet looks quite precise.

Google has multimodal Gemini and on-device Gemini Nano, Meta has a strong Llama lineup, OpenAI remains the commercial benchmark with GPT models. But Nvidia is trying to assemble a rare combination of four properties at once: multimodality, open weights, commercial license, and operation on edge hardware without cloud-scale requirements. If this works, the company will be able to capture value at three levels simultaneously — hardware, inference tools, and the model itself.

What it means

Nemotron 3 Nano Omni is a bet that the next wave of AI agents will operate not only in the cloud but also locally, closer to data, cameras, microphones, and corporate interfaces. If Nvidia confirms the stated figures in real-world deployments, it will become not just a supplier of "picks and shovels" for the AI boom, but one of the strongest players in the models market itself.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation