Top-k Sampling

Top-k sampling is a text generation strategy that restricts the model's next token choice to the k most probable tokens, preventing selection of unlikely tokens that could derail coherent generation.

After the model computes logits for all vocabulary tokens, top-k keeps only the k highest-scoring tokens, sets all others to negative infinity, then applies softmax to get a probability distribution over the remaining k tokens. The next token is sampled from this truncated distribution.

The hyperparameter k controls the tradeoff between diversity and quality: k=1 is greedy decoding (always pick the most likely token, deterministic but repetitive); k=50 allows moderate diversity; k=1000 permits major creativity but risks incoherent outputs. Top-k's main limitation is that k is fixed regardless of the distribution shape.

When the model is confident (one token has 95% probability), k=50 includes 49 near-zero probability tokens that add noise. When the model is uncertain (many tokens have similar probability), k=50 might exclude plausible options. Top-p (nucleus) sampling addresses this by dynamically adjusting the cutoff based on cumulative probability.

In practice, top-k and top-p are often combined, with top-k providing a hard cap and top-p providing adaptive truncation within that cap.

Interactive Concept: top k sampling

Interactive visualization of how AI models select the next token using top-k sampling

Parameters

k (top tokens): 3

Temperature: 1.0

Token Probabilities

the

Logit

4.2

cat

Logit

3.8

dog

Logit

3.5

house

Logit

2.1

tree

Logit

1.8

moon

Logit

1.2

purple

Logit

0.8

flying

Logit

0.3

Process:

1. Start with raw logits from model

2. Keep only top-3 tokens, filter out the rest

3. Apply softmax to get probability distribution

4. Sample from the filtered distribution

Top-k sampling is a text generation strategy that restricts the model's next token choice to the k most probable tokens, preventing selection of unlikely tokens that could derail coherent generation.

In practice, top-k and top-p are often combined, with top-k providing a hard cap and top-p providing adaptive truncation within that cap.

Interactive Concept: top k sampling

Top-k Sampling

Interactive visualization of how AI models select the next token using top-k sampling

Parameters

k (top tokens): 3

Temperature: 1.0

Token Probabilities

the

Logit

4.2

cat

Logit

3.8

dog

Logit

3.5

house

Logit

2.1

tree

Logit

1.8

moon

Logit

1.2

purple

Logit

0.8

flying

Logit

0.3

Process:

1. Start with raw logits from model

2. Keep only top-3 tokens, filter out the rest

3. Apply softmax to get probability distribution

4. Sample from the filtered distribution

Top-k Sampling

Top-k Sampling

Parameters

Token Probabilities

Related Terms

Top-k Sampling

Top-k Sampling

Parameters

Token Probabilities

Related Terms