Training

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that adds trainable low-rank matrix pairs to frozen pre-trained model layers, enabling large-model adaptation with a fraction of the original parameter count.

LoRA (Low-Rank Adaptation) is a method for fine-tuning large pre-trained neural networks by inserting small trainable low-rank decompositions into each target weight matrix while keeping the original model weights frozen. It was introduced by Edward Hu and colleagues at Microsoft Research in a 2021 paper (published at ICLR 2022) and has since become the dominant parameter-efficient fine-tuning (PEFT) technique for large language models.

For a weight matrix W of shape d × k in a transformer layer, LoRA introduces two small matrices B (d × r) and A (r × k), where the rank r is much smaller than both d and k—typically 4, 8, or 16. During the forward pass, the effective weight becomes W + BA; only B and A are updated during training while W stays frozen. Because r is orders of magnitude smaller than d, trainable parameter counts drop dramatically: fine-tuning a 7-billion-parameter model with LoRA at rank 8 typically requires updating fewer than 20 million parameters instead of all 7 billion, cutting optimizer memory requirements by a comparable factor.

LoRA's practical importance is threefold. First, it makes fine-tuning feasible on hardware that cannot hold a full optimizer state for a large model. Second, multiple LoRA adapters can be stored and hot-swapped on top of a single shared base model, enabling efficient multi-tenant serving of customized variants without duplicating base weights in memory. Third, adapter files are compact—often tens to hundreds of megabytes—making community sharing straightforward; the Hugging Face Hub hosts thousands of publicly released LoRA adapters for models across many domains.

By 2026, LoRA is natively integrated into the Hugging Face PEFT library and is the default fine-tuning method in most open-source LLM tooling, including Axolotl, LLaMA-Factory, and Unsloth. Variants such as DoRA (weight-decomposed LoRA) and rsLoRA (rank-stabilized scaling) address stability and expressiveness limitations of the original formulation. Commercial fine-tuning platforms from OpenAI, Together AI, and Fireworks AI offer LoRA-based customization as a managed service with no infrastructure setup required.

Example

A company adapts Llama 3 8B to answer domain-specific customer support questions by training LoRA adapters at rank 16 on a dataset of 5,000 resolved tickets; total trainable parameters are under 10 million, training completes on a single GPU in a few hours, and the resulting adapter file is approximately 40 MB.

Related terms

Latest news on this topic

← Glossary