Scaling Law

Scaling laws are empirical relationships that predict how neural network performance improves as you increase model parameters, training data, and compute budget. OpenAI's 2020 paper established that loss decreases as a power law with each scaling factor, meaning predictable improvement from throwing more resources at the problem.

The Chinchilla scaling laws from DeepMind refined this: for compute-optimal training, model size and training tokens should scale roughly proportionally. A 10x larger model needs 10x more training data to be optimally trained, not the same data for longer. This overturned the previous approach of training massive models on fixed datasets. These laws directly affect AI development planning.

If you know your compute budget, you can predict the optimal model size and data requirements to hit a performance target. You can estimate how much larger a model needs to be to achieve a specific capability.

The regularity of scaling laws across different architectures and domains surprised researchers, the same mathematical relationships hold from small experiments to the largest models ever trained. Scaling laws guide billion-dollar infrastructure investments. They explain why AI labs pursue ever-larger training runs.

They also reveal diminishing returns: each doubling of compute yields a fixed improvement, not an accelerating one.

Interactive Concept: scaling law

Scaling Laws

Discover how neural network performance improves with model size, training data, and compute. Adjust the sliders to see power law relationships in action.

Scaling Factors

Model Parameters: 1.0B

Training Data: 1.0T tokens

Compute Budget: 1.0E FLOPs

Performance Metrics

Training Loss0.172

Model Performance91.4%

Power Law Formula

L = min(A·N^(-α), B·D^(-β), C·C^(-γ))

OpenAI Scaling Law: Performance is bottlenecked by the least scaled resource. Increasing all factors improves results.

Related Essays

AI Superintelligence Timeline →

They also reveal diminishing returns: each doubling of compute yields a fixed improvement, not an accelerating one.

Interactive Concept: scaling law

Scaling Laws

Discover how neural network performance improves with model size, training data, and compute. Adjust the sliders to see power law relationships in action.

Scaling Factors

Model Parameters: 1.0B

Training Data: 1.0T tokens

Compute Budget: 1.0E FLOPs

Performance Metrics

Training Loss0.172

Model Performance91.4%

Power Law Formula

L = min(A·N^(-α), B·D^(-β), C·C^(-γ))

OpenAI Scaling Law: Performance is bottlenecked by the least scaled resource. Increasing all factors improves results.

Related Essays

AI Superintelligence Timeline →

Scaling Laws

Scaling Factors

Performance Metrics

Power Law Formula

Related Terms

Related Essays

Scaling Law

Scaling Laws

Scaling Factors

Performance Metrics

Power Law Formula

Related Terms

Related Essays