veda.ng
Back to Glossary

Beam Search

Beam search is a decoding algorithm for language models that maintains multiple candidate sequences in parallel, exploring the k most promising options at each generation step rather than greedily committing to the single best token. The algorithm works by expanding each current candidate with all possible next tokens, scoring the results, and keeping only the top k candidates (the 'beam width') for the next step. A beam width of 1 is equivalent to greedy decoding and always selects the highest probability token. Wider beams explore more of the output space, often finding higher-probability complete sequences that greedy decoding misses because they required lower-probability intermediate tokens. Beam search was essential for neural machine translation, where greedy decoding often produced mediocre translations while beam search found much better ones. The computational cost scales linearly with beam width. Beam width 10 requires roughly 10x the compute of greedy decoding. Variants include length normalization (preventing bias toward shorter sequences), diverse beam search (encouraging exploration of different output structures), and nucleus-constrained beam search. For modern LLMs used in chat applications, beam search is less common than sampling methods because it tends to produce generic, repetitive text. But for deterministic tasks like translation or structured generation, it remains valuable.