NVIDIA developed a method for training neural networks at 4-bit precision

Q: Источник материала?

Оригинальная публикация на MarkTechPost. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-19. Время чтения: 4 мин.

NVIDIA developed NVFP4, a new methodology for training neural network models at 4-bit precision instead of the traditional 8-bit. The method halves memory requi

Hamidun News Editorial

AI monitoring · MarkTechPost

2026-05-19· 3 min

NVIDIA developed a method for training neural networks at 4-bit precision — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

NVIDIA introduced NVFP4 — a new methodology for training neural networks with 4-bit precision. This allows significant savings in memory and computational resources when training large models.

How It Works

The standard approach uses 8-bit (FP8) or 16-bit (BF16) precision for storing intermediate results and training gradients. NVIDIA managed to halve these memory requirements by transitioning to the 4-bit NVFP4 format.

The method doesn't simply reduce precision, but combines several techniques: selective use of more precise BF16 on critical model layers, special mathematical transformations of gradient input data (16×16 random Hadamard transforms), and stochastic rounding during computations.

Traditionally, 4-bit training was considered risky — with prolonged training, rounding errors accumulate and lead to model degradation. The company tested NVFP4 on a hybrid Mamba-Transformer model with 12 billion parameters, training it on 10 trillion tokens — the longest public experiment with 4-bit training to date. This demonstrates that with the right methodology, numerical errors do not accumulate catastrophically.

Results Exceeded Expectations

The key metric was accuracy on the MMLU-Pro benchmark — a comprehensive knowledge test covering mathematics, natural sciences, humanities, and other fields. The NVFP4 model achieved 62.58%, which is literally just 0.04% lower than a model trained with the traditional FP8 method (62.62%). For practical applications, this difference is completely insignificant — accuracy within the margin of measurement error.

Against the backdrop of a twofold memory savings, this is a rare case where reducing numerical precision did not lead to a noticeable decline in result quality. This means NVFP4 doesn't sacrifice correctness for resource savings.

Memory reduction: 2x compared to FP8
Accuracy loss on benchmark: less than 0.1%
Experiment scale: 10 trillion tokens
Architecture: hybrid Mamba-Transformer model with 12 billion parameters

What This Means for the Industry

The result is important for companies training models from scratch. A twofold memory saving means the same volume of computations can be performed faster, cheaper, or the saved resources can be invested in training larger models. If your company trains a model on 1000 A100 GPU-days, NVFP4 can reduce this to 500 GPU-days while maintaining quality.

For researchers, this opens new opportunities for experimentation with architectures, data volumes, and hyperparameters. It becomes easier to test new ideas on larger models in a day than on smaller models in a week.

However, the method still requires additional validation on other model types — particularly on pure transformers and models with different architectures. NVIDIA has only shown results on the hybrid Mamba-Transformer architecture so far. It's also important to understand that 4-bit training is a specialized technique requiring specific software optimizations and hardware support (full support currently exists only on NVIDIA GPUs).

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com