veda.ng
Back to Glossary

Scaling Law

Scaling laws are empirical relationships that predict how neural network performance improves as you increase model parameters, training data, and compute budget. OpenAI's 2020 paper established that loss decreases as a power law with each scaling factor, meaning predictable improvement from throwing more resources at the problem. The Chinchilla scaling laws from DeepMind refined this: for compute-optimal training, model size and training tokens should scale roughly proportionally. A 10x larger model needs 10x more training data to be optimally trained, not the same data for longer. This overturned the previous approach of training massive models on fixed datasets. These laws directly affect AI development planning. If you know your compute budget, you can predict the optimal model size and data requirements to hit a performance target. You can estimate how much larger a model needs to be to achieve a specific capability. The regularity of scaling laws across different architectures and domains surprised researchers, the same mathematical relationships hold from small experiments to the largest models ever trained. Scaling laws guide billion-dollar infrastructure investments. They explain why AI labs pursue ever-larger training runs. They also reveal diminishing returns: each doubling of compute yields a fixed improvement, not an accelerating one.