Low-Rank Adaptation (LoRA) is a technique for fine-tuning large neural networks cheaply. Instead of updating every parameter of a massive model, LoRA inserts two small trainable matrices that capture the most important directions of change. These matrices have a much lower rank than the full weight matrix, so they contain far fewer parameters to learn. The original model stays frozen. Only the low-rank additions get updated during training. This reduces memory usage dramatically and speeds up training enough to run on a single GPU. A company can take a pre-trained language model and adapt it to its own customer-support data without needing a server farm. The number of trainable parameters drops from billions to a few million. LoRA also enables rapid experimentation. Teams can train many domain-specific adapters, compare results, and discard failures while keeping the base model untouched. The same base model can serve multiple downstream tasks by swapping in different low-rank adapters. This modularity is making advanced AI capabilities accessible to smaller organizations that cannot afford full-model fine-tuning.
Back to Glossary