NVIDIA Releases Nemotron 3 Nano Omni on Amazon SageMaker JumpStart on Release Day

Q: What is the source?

Originally published on AWS Machine Learning Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 28, 2026. Reading time: 3 min.

NVIDIA made Nemotron 3 Nano Omni available on Amazon SageMaker JumpStart on release day. The model combines text, image, audio, and video processing in a…

Hamidun News Editorial

AI monitoring · AWS Machine Learning Blog

Apr 28, 2026· 3 min

AI-processed from AWS Machine Learning Blog; edited by Hamidun News

NVIDIA Releases Nemotron 3 Nano Omni on Amazon SageMaker JumpStart on Release Day — Source: AWS Machine Learning Blog. Collage: Hamidun News.

◐ Listen to article

On April 28, 2026, NVIDIA added the multimodal Nemotron 3 Nano Omni model to Amazon SageMaker JumpStart on its release day. For teams on AWS, this shortens the path from model announcement to pilot: the service is already ready for deployment and inference runs.

What is this model

Nemotron 3 Nano Omni is an open multimodal LLM with 30 billion total parameters and 3 billion active ones. It is built on a hybrid Mamba2 Transformer Hybrid Mixture of Experts architecture. NVIDIA assembled the model from three components: the Nemotron 3 Nano language core, the CRADIO v4-H visual encoder for images and video, and the Parakeet speech encoder for audio.

The model accepts video, audio, images, and text as input and returns text responses as output. According to AWS documentation, the model is designed not only for chat but also for agentic scenarios. It supports a context window of up to 131 thousand tokens, reasoning, tool calling, JSON responses, and word-level timestamps for transcription.

In SageMaker JumpStart, the model is available in FP8, emphasizing the balance between quality and efficiency. On the licensing front, which matters for commercial use, Nemotron 3 Nano Omni is distributed under the NVIDIA Open Model Agreement.

The model is meant to "see, hear, and reason" across multiple

modalities in a single inference pass.

Where the model is useful

The main idea of the announcement is to eliminate the zoo of separate models for vision, speech, and text. In a typical enterprise agentic system, each such module adds latency, complicates orchestration, and breaks overall context. AWS and NVIDIA propose using Nemotron 3 Nano Omni as a single perception layer: the model reads the screen, understands documents, transcribes speech, and analyzes video, while the rest of the agent logic operates on top of one unified picture.

Computer agents that navigate interfaces, dashboards, and browsers
Document intelligence for contracts, SOWs, financial documents, tables, and screenshots
Analysis of calls, meetings, and other audio-video content in support services
Visual event verification, such as deliveries or orders, where OCR and temporal context are needed

The model has fairly clear input limits, and they already look practical for pilots. Video — MP4 up to 2 minutes and up to 256 frames, audio — WAV or MP3 up to one hour in duration, images — JPEG and PNG, text — up to 131 thousand tokens. This is not a universal unlimited machine, but for internal assistants, review pipelines, and operational task automation, the range is more than workable. In conclusion, AWS separately claims up to 9 times higher throughput compared to alternative open omni-models.

How to run the model

SageMaker JumpStart presents this release as one-click deployment. The basic scenario is straightforward: open SageMaker Studio, go to the JumpStart section, find Nemotron 3 Nano Omni, select the model card, and click Deploy. Before that, AWS asks you to check three things: account availability, JumpStart access permissions, and GPU instance quotas such as ml.

p4d.24xlarge or ml.p5.

48xlarge. So there is a quick start, but it still depends on enterprise infrastructure readiness and GPU budget. For teams deploying models via code, there is also a path through the SageMaker Python SDK with a ready model_id.

After deployment, the endpoint accepts multimodal requests: you can describe an image, summarize a meeting recording, or transcribe a call with action items highlighted. AWS also recommends two inference modes: thinking for complex reasoning with temperature 0.6, top_p 0.

95, and max_tokens 20480, and instruct for more direct tasks where speed matters. After experiments, it is best to delete the endpoint right away to avoid accruing extra costs.

What this means

The appearance of Nemotron 3 Nano Omni in JumpStart on release day shows that AWS is accelerating the delivery of fresh open models straight into the production workflow. For business, this is a positive signal: multimodal agents are gradually transitioning from a set of disparate components into a more cohesive product stack that can be tested on your own data without lengthy from-scratch assembly.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation