Top-p Sampling

Top-p sampling, also called nucleus sampling, dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p, adapting the candidate pool size based on the model's confidence distribution. After computing probabilities from logits, top-p sorts tokens by probability and includes tokens from highest to lowest until their cumulative probability reaches p.

9, only that token is considered. If probability is spread across many tokens, more are included to reach the threshold. This adaptive behavior is the key advantage over top-k: the candidate set expands when the model is uncertain and contracts when it's confident. 9 typically works well across diverse tasks, it focuses on the probability mass that matters while allowing occasional surprises.

95+) allow more creativity and variation. Top-p was introduced in the 2019 paper 'The Curious Case of Neural Text Degeneration' which showed it produces more human-like text than top-k or pure temperature sampling. Most production systems combine top-p with temperature: temperature shapes the distribution, then top-p truncates to the nucleus of likely tokens.

This combination provides reliable generation across diverse prompts.

Interactive Concept: top p sampling

Top-p Sampling (Nucleus Sampling)

Dynamically selects the smallest set of tokens whose cumulative probability exceeds threshold p

p-value: 0.90

0.1 (diverse)0.95 (focused)

Token Probabilities

the

85.0%cumulative: 85.0%

8.0%cumulative: 93.0%

this

4.0%cumulative: 97.0%

that

2.0%cumulative: 99.0%

1.0%cumulative: 100.0%

Nucleus Formation

Tokens in nucleus

93.0%

Total probability

Cumulative Probability

the85.0%

a93.0%

this97.0%

that99.0%

my100.0%

p = 0.9 threshold

How it works: Top-p sampling includes tokens from highest to lowest probability until their cumulative probability reaches the threshold p. With p=0.9, we include the top 2 tokens covering 93.0% of probability mass. High p values create focused, predictable outputs.

This combination provides reliable generation across diverse prompts.

Interactive Concept: top p sampling

Top-p Sampling (Nucleus Sampling)

Dynamically selects the smallest set of tokens whose cumulative probability exceeds threshold p

p-value: 0.90

0.1 (diverse)0.95 (focused)

Token Probabilities

the

85.0%cumulative: 85.0%

8.0%cumulative: 93.0%

this

4.0%cumulative: 97.0%

that

2.0%cumulative: 99.0%

1.0%cumulative: 100.0%

Nucleus Formation

Tokens in nucleus

93.0%

Total probability

Cumulative Probability

the85.0%

a93.0%

this97.0%

that99.0%

my100.0%

p = 0.9 threshold

Top-p Sampling (Nucleus Sampling)

Token Probabilities

Nucleus Formation

Related Terms

Top-p Sampling

Top-p Sampling (Nucleus Sampling)

Token Probabilities

Nucleus Formation

Related Terms