An adapter is a small trainable module inserted into a frozen pretrained model, enabling task-specific adaptation while keeping the vast majority of parameters fixed. The typical adapter architecture is a bottleneck: a down-projection to a lower dimension, a nonlinearity, and an up-projection back to the original dimension. This is inserted after attention or feedforward layers, with a residual connection so the module learns only what to add to the frozen computation. Adapters contain 0.5-5% of the parameters of the layers they modify, yet can match full fine-tuning performance on many tasks. The parameter efficiency enables several practical benefits: different adapters for different tasks share the same base model; adapters can be trained on limited hardware; and adapters can be combined through interpolation or mixture. Training adapters is faster and cheaper than full fine-tuning because only adapter parameters require gradients and optimizer states. The base model's forward pass is unchanged, making adapters complementary to quantization and other efficiency techniques. Adapter methods have evolved into a family: LoRA modifies weight matrices with low-rank updates; prefix tuning prepends learnable tokens; prompt tuning learns input embeddings. The common principle is keeping most parameters frozen while learning a small, targeted modification. This has democratized LLM customization.
Back to Glossary