LoRA | Glossary | Vedang Vatsa

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that adds small trainable low-rank matrices to frozen model weights, enabling task-specific adaptation at a fraction of the cost of full fine-tuning. Weight updates during fine-tuning tend to have low intrinsic rank, the meaningful changes can be captured by much smaller matrices.

LoRA decomposes the weight update into two small matrices: A (d x r) and B (r x d), where r is the rank (typically 8-64) and d is the weight dimension. The effective update is A x B, which has the same shape as the original weight but far fewer parameters. During inference, the original weight W operates alongside the LoRA modification: output = W(x) + B(A(x)). This adds minimal latency.

LoRA adapters can be merged into base weights for zero-overhead deployment or kept separate for easy switching between tasks. A 7B parameter model might require 14GB for full fine-tuning but under 100MB for LoRA adapters. Multiple LoRA adapters can coexist for different tasks, swapped at inference time without reloading the base model.

LoRA has become the dominant approach for fine-tuning large language models, enabling customization on consumer hardware. QLoRA combines LoRA with quantization for even greater efficiency.

Interactive Concept: lora

LoRA: Low-Rank Adaptation

Interactive visualization of how LoRA decomposes weight updates into smaller matrices, dramatically reducing trainable parameters while maintaining model performance.

Rank (r):4

Traditional Fine-tuning

Update entire weight matrix

ΔW

8 × 8

Parameters: 64

→

LoRA Adaptation

ΔW ≈ B × A

B×A

8 × 8

Parameters: 64

Full Parameters

LoRA Parameters

0.0%

Reduction

Key Insight: By constraining updates to low-rank decompositions, LoRA captures the essential adaptations while using drastically fewer parameters. Adjust the rank to see the parameter trade-off.

Related Terms

Fine-Tuning Large Language Model (LLM)

LoRA has become the dominant approach for fine-tuning large language models, enabling customization on consumer hardware. QLoRA combines LoRA with quantization for even greater efficiency.

Interactive Concept: lora

LoRA: Low-Rank Adaptation

Interactive visualization of how LoRA decomposes weight updates into smaller matrices, dramatically reducing trainable parameters while maintaining model performance.

Rank (r):4

Traditional Fine-tuning

Update entire weight matrix

ΔW

8 × 8

Parameters: 64

→

LoRA Adaptation

ΔW ≈ B × A

B×A

8 × 8

Parameters: 64

Full Parameters

LoRA Parameters

0.0%

Reduction

Key Insight: By constraining updates to low-rank decompositions, LoRA captures the essential adaptations while using drastically fewer parameters. Adjust the rank to see the parameter trade-off.

Related Terms

Fine-Tuning Large Language Model (LLM)