IEEE Spectrum AI→ original

University of Twente Reduces Energy Consumption of LLM Training by 14%

Twente University developed a method that reduces energy consumption during LLM training by 14% without losing performance. The DVFS technique precisely…

AI-processed from IEEE Spectrum AI; edited by Hamidun News
University of Twente Reduces Energy Consumption of LLM Training by 14%
Source: IEEE Spectrum AI. Collage: Hamidun News.
◐ Listen to article

Researchers from Twente University in the Netherlands developed a method that allows saving up to 14% of energy when training large language models without losing training speed. The technique is based on dynamic management of GPU clock frequencies and was applied at such a granular level of detail for the first time.

How DVFS Works

DVFS (dynamic voltage-frequency scaling) is a known technique that changes the clock frequency in a GPU depending on the current computational load. Each operation in the chip is triggered by a clock pulse, and the frequency of these pulses determines both the speed of the GPU and its energy consumption. Modern GPUs have two independent clock systems: one for the computational core and one for the memory block.

When the core is computing intensively, its clock runs at high frequency, and the memory clock can be slowed down. When the core waits for data from memory, the situation is opposite — the core can slow down, and the memory speeds up. This balance reduces overall energy consumption without losing performance.

Why Previous Methods Didn't Work

DVFS has existed since the 1990s, but applying it to LLM training proved more difficult than expected. Previous attempts either slowed down computations too much or were not flexible enough. The main problem: most methods regulated frequency only at the level of entire training iterations (forward pass and backpropagation), which was too coarse for effective optimization.

Innovation at the Kernel Level

Team Jeffrey Spaan decided to change the frequency at a much finer level — at the level of individual kernels (elementary computational blocks). GPU computations are broken down into tiny operations: for example, one vector multiplication constitutes one kernel. When training one neural network layer, approximately 40 such kernels are launched. By regulating frequency for each kernel separately, the team was able to find much greater energy savings:

  • Frequency adjustment at the level of individual computational blocks instead of entire iterations
  • Predicting the next kernel allows setting the required frequency in advance
  • Energy savings of 14% on average with only 0.6% slowdown
  • Automatic GPU DVFS performs worse because it cannot predict the next steps

Results and Limitations

The experiment was conducted on the GPT-3-XL model (1.3 billion parameters) on an Nvidia RTX 3080 Ti GPU. Result: 14% energy savings with only 0.6% slowdown.

"We optimize energy savings without losing performance.

In the real world, performance is the holy grail," — Jeffrey Spaan.

One limitation: frequency switching requires time, although less than complete core shutdown-restart. In the researchers' calculations, this was not taken into account, so 14% is the best-case scenario. Newer GPUs, such as Nvidia Blackwell, have much faster switching and will be able to make fuller use of these savings.

What This Means

If Spaan's method becomes industry-wide, billions of watt-hours of energy could be saved when training frontier models. This will reduce the carbon footprint of the AI industry and its operating costs without requiring investment in new equipment.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…