Low-Rank Adaptation (LoRA) | Glossary

Low-Rank Adaptation (LoRA) is a technique for fine-tuning large neural networks cheaply. Instead of updating every parameter of a massive model, LoRA inserts two small trainable matrices that capture the most important directions of change. These matrices have a much lower rank than the full weight matrix, so they contain far fewer parameters to learn. The original model stays frozen.

Only the low-rank additions get updated during training. This reduces memory usage dramatically and speeds up training enough to run on a single GPU. A company can take a pre-trained language model and adapt it to its own customer-support data without needing a server farm. The number of trainable parameters drops from billions to a few million. LoRA also enables rapid experimentation.

Teams can train many domain-specific adapters, compare results, and discard failures while keeping the base model untouched. The same base model can serve multiple downstream tasks by swapping in different low-rank adapters. This modularity is making advanced AI capabilities accessible to smaller organizations that cannot afford full-model fine-tuning.

Interactive Concept: low rank adaptation lora

Low-Rank Adaptation (LoRA) Visualization

Compare full fine-tuning vs LoRA: instead of updating all parameters, LoRA uses two small matrices to capture the most important changes

Original Weight Matrix W

8×8 = 64 params

Update Matrix ΔW

8×8 = 64 params

Training Statistics

Method:Full Fine-tuning

Trainable Params:64

Training Step:0/20