Hugging Face Blog→ original

Hugging Face сравнила все альтернативы LoRA: кто побеждает в тонкой настройке LLM

Hugging Face опубликовала масштабное сравнение PEFT-методов тонкой настройки LLM — и спойлер: обогнать LoRA реально, но цена у каждого метода своя. DoRA чуть…

AI-processed from Hugging Face Blog; edited by Hamidun News
Hugging Face сравнила все альтернативы LoRA: кто побеждает в тонкой настройке LLM
Source: Hugging Face Blog. Collage: Hamidun News.
◐ Listen to article

LoRA has become the de facto standard for fine-tuning large language models: cheap, fast, and works almost everywhere without surprises. Hugging Face decided to ask an honest question: can we do better — and if so, when exactly?

Why LoRA Holds Its Position

LoRA (Low-Rank Adaptation) works simply: instead of updating all billions of a model's weights, the method adds a couple of small low-rank matrices to key layers. The number of trainable parameters drops 10–1000 times. This makes fine-tuning accessible even on consumer GPUs.

This is why LoRA became ubiquitous: it's used for further training Llama and Mistral, for creating custom styles in Stable Diffusion, for adapting corporate LLMs to domain-specific needs. The Hugging Face PEFT library sees hundreds of thousands of downloads per week. But LoRA has weaknesses.

At high matrix ranks (rank=64 and above), training becomes unstable. On tasks where accurate knowledge transfer matters, the method sometimes loses to full fine-tuning. And in scenarios with tight memory constraints — for example, training on a single budget GPU — even LoRA can prove too resource-hungry.

What Hugging Face Tested

The team took the PEFT library and conducted a systematic comparison of LoRA with five alternatives on real downstream tasks:

  • DoRA — decomposes weights into direction and magnitude, updates them independently, approaching full fine-tuning behavior
  • LoRA+ — simple idea: matrices A and B are trained with different learning rates, matrix B gets higher lr to accelerate convergence
  • rsLoRA — normalization coefficient that stabilizes gradients at high rank values
  • VeRA — random frozen matrices, only tiny scaling vectors are trained; parameters are dozens of times fewer than LoRA
  • GaLore — projects gradients themselves into low-rank space, saving optimizer memory without changing weight architecture

Metrics: quality on control tasks (NLU, instruction following, summarization), peak GPU memory consumption, and training epoch speed.

Who's Challenging the Leader

There is no clear winner — each method has its own profile. DoRA consistently shows slightly better quality compared to LoRA with the same number of parameters and memory. Especially noticeable on instruction-following and reasoning tasks. The cost is slightly longer training time due to additional weight decomposition. rsLoRA doesn't improve baseline quality but eliminates instability at high ranks. If you need rank=128 or higher — rsLoRA is practically mandatory, classical LoRA starts to "drift" there. VeRA is interesting for scenarios with tight constraints on adapter size — for example, when serving thousands of user adapters on a server — but slightly loses in quality.

"LoRA remains the best default choice — but knowledge of alternatives

allows maximizing performance in specific conditions," the study authors conclude.

GaLore opens the possibility to train on GPUs with small VRAM without changing the weight architecture. Suitable for pre-training or continued pre-training when you need to work with all weights but have critically little memory. Training is noticeably slower in this case.

What This Means

The PEFT ecosystem is maturing: instead of one universal method for all cases, a matrix of tools is forming. For product teams, this means one thing — before choosing a fine-tuning method, it's worth spending an hour on a comparative benchmark on your own task rather than taking LoRA by default. The chances that an alternative will give noticeable improvement precisely on your scenario are now higher than ever.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…