veda.ng
Back to Glossary

Top-p Sampling

Top-p sampling, also called nucleus sampling, dynamically selects the smallest set of tokens whose cumulative probability exceeds a threshold p, adapting the candidate pool size based on the model's confidence distribution. After computing probabilities from logits, top-p sorts tokens by probability and includes tokens from highest to lowest until their cumulative probability reaches p. If the top token has 90% probability and p=0.9, only that token is considered. If probability is spread across many tokens, more are included to reach the threshold. This adaptive behavior is the key advantage over top-k: the candidate set expands when the model is uncertain and contracts when it's confident. Setting p=0.9 typically works well across diverse tasks, it focuses on the probability mass that matters while allowing occasional surprises. Lower p values (0.5-0.7) produce more focused, predictable text; higher values (0.95+) allow more creativity and variation. Top-p was introduced in the 2019 paper 'The Curious Case of Neural Text Degeneration' which showed it produces more human-like text than top-k or pure temperature sampling. Most production systems combine top-p with temperature: temperature shapes the distribution, then top-p truncates to the nucleus of likely tokens. This combination provides reliable generation across diverse prompts.