Foundation Model

A foundation model is a large AI model trained on broad data at massive scale that serves as the base for many downstream applications through fine-tuning, prompting, or other adaptation techniques. GPT-4, Claude, Gemini, Llama, and BERT are foundation models.

They share a common pattern: pretrain once on enormous datasets using self-supervised objectives, then adapt the same model to diverse tasks. This amortizes the massive investment in pretraining (training GPT-4 reportedly cost over $100 million) across thousands of applications. Foundation models exhibit emergent capabilities: abilities that appear at scale without being explicitly trained.

A model trained to predict text learns to do arithmetic, write code, and reason about physics. The phenomenon remains poorly understood but has been consistently observed across model families and sizes. The term was coined by Stanford researchers in 2021 to emphasize that these models serve as the foundation for many applications, not standalone products.

This creates both opportunity and risk: opportunities because one model enables countless applications, risks because flaws in the foundation propagate everywhere. Foundation models have homogenized AI development, most AI applications now start from the same handful of pretrained models, raising concerns about monoculture and single points of failure.

Interactive Concept: foundation model

Foundation Model Architecture

Explore how one massive pretrained model adapts to many downstream tasks

1. Select Foundation Model

2. Pretraining Investment

One-time pretraining cost: $100M (amortized across all applications)

3. Choose Adaptation Method

4. Downstream Applications

Code Generation

Text Summarization

Language Translation

Question Answering

Creative Writing

Data Analysis

Medical Diagnosis

Legal Research

Customer Support

Economic Advantage

Without Foundation Models:

$900.0M

Training 9 specialized models

With Foundation Model:

$190.0M

1 foundation + 9 adaptations

Savings: $710.0M

Interactive Concept: foundation model

Foundation Model Architecture

Explore how one massive pretrained model adapts to many downstream tasks

1. Select Foundation Model

2. Pretraining Investment

One-time pretraining cost: $100M (amortized across all applications)

3. Choose Adaptation Method

4. Downstream Applications

Code Generation

Text Summarization

Language Translation

Question Answering

Creative Writing

Data Analysis

Medical Diagnosis

Legal Research

Customer Support

Economic Advantage

Without Foundation Models:

$900.0M

Training 9 specialized models

With Foundation Model:

$190.0M

1 foundation + 9 adaptations

Savings: $710.0M

Foundation Model Architecture

1. Select Foundation Model

2. Pretraining Investment

3. Choose Adaptation Method

4. Downstream Applications

Economic Advantage

Related Terms

Foundation Model

Foundation Model Architecture

1. Select Foundation Model

2. Pretraining Investment

3. Choose Adaptation Method

4. Downstream Applications

Economic Advantage

Related Terms