veda.ng
Back to Glossary

Top-k Sampling

Top-k Sampling infographic

Top-k sampling is a text generation strategy that restricts the model's next token choice to the k most probable tokens, preventing selection of unlikely tokens that could derail coherent generation.

After the model computes logits for all vocabulary tokens, top-k keeps only the k highest-scoring tokens, sets all others to negative infinity, then applies softmax to get a probability distribution over the remaining k tokens. The next token is sampled from this truncated distribution.

The hyperparameter k controls the tradeoff between diversity and quality: k=1 is greedy decoding (always pick the most likely token, deterministic but repetitive); k=50 allows moderate diversity; k=1000 permits major creativity but risks incoherent outputs. Top-k's main limitation is that k is fixed regardless of the distribution shape.

When the model is confident (one token has 95% probability), k=50 includes 49 near-zero probability tokens that add noise. When the model is uncertain (many tokens have similar probability), k=50 might exclude plausible options. Top-p (nucleus) sampling addresses this by dynamically adjusting the cutoff based on cumulative probability.

In practice, top-k and top-p are often combined, with top-k providing a hard cap and top-p providing adaptive truncation within that cap.

Interactive Visualizer

Top-k Sampling

Interactive visualization of how AI models select the next token using top-k sampling

Parameters

Token Probabilities

the
Logit
4.2
cat
Logit
3.8
dog
Logit
3.5
house
Logit
2.1
tree
Logit
1.8
moon
Logit
1.2
purple
Logit
0.8
flying
Logit
0.3
Process:
1. Start with raw logits from model
2. Keep only top-3 tokens, filter out the rest
3. Apply softmax to get probability distribution
4. Sample from the filtered distribution