Prefix Tuning | Glossary

Prefix tuning is a parameter-efficient fine-tuning method that adapts a frozen language model by learning a small set of continuous vectors prepended to the input, steering model behavior without updating any of the original model weights.

Instead of fine-tuning billions of parameters, prefix tuning trains only the prefix embeddings, typically a few hundred vectors per layer, totaling less than 1% of model parameters. These learned prefix vectors act as soft prompts that condition all subsequent computation.

The model's attention mechanism attends to the prefix before the actual input, receiving task-specific context that guides its processing. During training, gradients flow back through the frozen model to update only the prefix parameters.

This greatly reduces memory requirements (no optimizer states for frozen weights) and enables multi-task learning (different prefixes for different tasks, same base model). Prefix tuning was introduced in 2021 as an alternative to full fine-tuning and prompt engineering. It sits between them in controllability: more flexible than discrete prompts, more efficient than full fine-tuning.

Related methods include prompt tuning (prefixes only at the input layer), adapter layers (small trainable modules inserted throughout), and LoRA (low-rank weight modifications). Prefix tuning works because transformers' attention mechanisms are highly sensitive to context, and learned context can be as effective as learned weights.

Interactive Concept: prefix tuning

Prefix Tuning Visualizer

Explore how prefix tuning adapts language models by learning only small prefix vectors while keeping the original model frozen

Model Architecture

Embedding

frozen

Prefix vectors: 10 × 768 dims = 7,680 trainable params

Layer 1

frozen

Layer 2

frozen

Layer 3

frozen

Output

frozen

Prefix Length: 10 vectors

Parameter Efficiency

Total Model Parameters:175,000,000

Trainable Prefixes:38,400

Efficiency:0.022%

Training Simulation

Show vs Full Fine-tuning

Input Processing Flow

The

cat

sat

the

mat

→

Adapted Output

Prefix vectors (red) guide the model's behavior for the input tokens (blue)

Related Terms

Fine-Tuning Large Language Model (LLM)

Interactive Concept: prefix tuning

Prefix Tuning Visualizer

Explore how prefix tuning adapts language models by learning only small prefix vectors while keeping the original model frozen

Model Architecture

Embedding

frozen

Prefix vectors: 10 × 768 dims = 7,680 trainable params

Layer 1

frozen

Layer 2

frozen

Layer 3

frozen

Output

frozen

Prefix Length: 10 vectors

Parameter Efficiency

Total Model Parameters:175,000,000

Trainable Prefixes:38,400

Efficiency:0.022%

Training Simulation

Show vs Full Fine-tuning

Input Processing Flow

The

cat

sat

the

mat

→

Adapted Output

Prefix vectors (red) guide the model's behavior for the input tokens (blue)

Related Terms

Fine-Tuning Large Language Model (LLM)