veda.ng
Back to Glossary

Foundation Model

A foundation model is a large AI model trained on broad data at massive scale that serves as the base for many downstream applications through fine-tuning, prompting, or other adaptation techniques. GPT-4, Claude, Gemini, Llama, and BERT are foundation models. They share a common pattern: pretrain once on enormous datasets using self-supervised objectives, then adapt the same model to diverse tasks. This amortizes the massive investment in pretraining (training GPT-4 reportedly cost over $100 million) across thousands of applications. Foundation models exhibit emergent capabilities: abilities that appear at scale without being explicitly trained. A model trained to predict text learns to do arithmetic, write code, and reason about physics. The phenomenon remains poorly understood but has been consistently observed across model families and sizes. The term was coined by Stanford researchers in 2021 to emphasize that these models serve as the foundation for many applications, not standalone products. This creates both opportunity and risk: opportunities because one model enables countless applications, risks because flaws in the foundation propagate everywhere. Foundation models have homogenized AI development, most AI applications now start from the same handful of pretrained models, raising concerns about monoculture and single points of failure.