MarkTechPost→ original

NVIDIA developed a method for training neural networks at 4-bit precision

NVIDIA developed NVFP4, a new methodology for training neural network models at 4-bit precision instead of the traditional 8-bit. The method halves memory requi

NVIDIA developed a method for training neural networks at 4-bit precision
Source: MarkTechPost. Collage: Hamidun News.
◐ Listen to article

NVIDIA introduced NVFP4 — a new methodology for training neural networks with 4-bit precision. This allows significant savings in memory and computational resources when training large models.

How It Works

The standard approach uses 8-bit (FP8) or 16-bit (BF16) precision for storing intermediate results and training gradients. NVIDIA managed to halve these memory requirements by transitioning to the 4-bit NVFP4 format.

The method doesn't simply reduce precision, but combines several techniques: selective use of more precise BF16 on critical model layers, special mathematical transformations of gradient input data (16×16 random Hadamard transforms), and stochastic rounding during computations.

Traditionally, 4-bit training was considered risky — with prolonged training, rounding errors accumulate and lead to model degradation. The company tested NVFP4 on a hybrid Mamba-Transformer model with 12 billion parameters, training it on 10 trillion tokens — the longest public experiment with 4-bit training to date. This demonstrates that with the right methodology, numerical errors do not accumulate catastrophically.

Results Exceeded Expectations

The key metric was accuracy on the MMLU-Pro benchmark — a comprehensive knowledge test covering mathematics, natural sciences, humanities, and other fields. The NVFP4 model achieved 62.58%, which is literally just 0.04% lower than a model trained with the traditional FP8 method (62.62%). For practical applications, this difference is completely insignificant — accuracy within the margin of measurement error.

Against the backdrop of a twofold memory savings, this is a rare case where reducing numerical precision did not lead to a noticeable decline in result quality. This means NVFP4 doesn't sacrifice correctness for resource savings.

  • Memory reduction: 2x compared to FP8
  • Accuracy loss on benchmark: less than 0.1%
  • Experiment scale: 10 trillion tokens
  • Architecture: hybrid Mamba-Transformer model with 12 billion parameters

What This Means for the Industry

The result is important for companies training models from scratch. A twofold memory saving means the same volume of computations can be performed faster, cheaper, or the saved resources can be invested in training larger models. If your company trains a model on 1000 A100 GPU-days, NVFP4 can reduce this to 500 GPU-days while maintaining quality.

For researchers, this opens new opportunities for experimentation with architectures, data volumes, and hyperparameters. It becomes easier to test new ideas on larger models in a day than on smaller models in a week.

However, the method still requires additional validation on other model types — particularly on pure transformers and models with different architectures. NVIDIA has only shown results on the hybrid Mamba-Transformer architecture so far. It's also important to understand that 4-bit training is a specialized technique requiring specific software optimizations and hardware support (full support currently exists only on NVIDIA GPUs).

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…