NVIDIA Nemotron: Diffusion Models Generate Text 6× Faster
NVIDIA Nemotron generates 32 tokens at once instead of one, using diffusion instead of autoregression. Three modes in one model: standard autoregressive, fast d

◐ Listen to article
NVIDIA Nemotron generates 32 tokens at once instead of one, using diffusion instead of autoregression. Three modes in one model: standard autoregressive, fast diffusion, and self-speculation with 6× speedup on B200. Models 3B, 8B, and 14B are already open source.