Temperature is a parameter controlling the randomness of language model output by scaling the logits (pre-softmax scores) before sampling the next token, directly shaping the creativity-accuracy tradeoff in generation. Mathematically, temperature divides logits before softmax: lower temperatures make the probability distribution sharper (concentrating probability on likely tokens), while higher temperatures flatten it (spreading probability across more options). At temperature 0, the model becomes deterministic, always selecting the highest-probability token (greedy decoding). At temperature 1, the model samples directly from its learned distribution. At temperatures above 1, unlikely tokens receive elevated probability, producing more varied but potentially incoherent output. Most applications use temperatures between 0 and 1. Low temperatures (0.1-0.3) suit tasks requiring accuracy and consistency: code generation, data extraction, factual Q&A. Medium temperatures (0.5-0.8) balance creativity and coherence for general conversation and writing. High temperatures (0.9-1.2) generate diverse, creative content but risk incoherence. Temperature interacts with other sampling parameters: top-k and top-p truncate the distribution before temperature-adjusted sampling. The optimal temperature depends on the task, model, and desired output characteristics. Production systems often use different temperatures for different features: low for structured outputs, higher for creative suggestions. Understanding temperature is fundamental to prompt engineering and API usage.
Back to Glossary