Cross-Entropy Loss

Cross-entropy loss is the standard objective function for training classification and language models, measuring the discrepancy between predicted probability distributions and true labels. For a single prediction, cross-entropy equals the negative logarithm of the probability assigned to the correct class.

High confidence in correct answers yields low loss; low confidence yields high loss; confident wrong answers yield very high loss. The logarithmic penalty creates strong gradients for incorrect predictions, accelerating learning. For language models, cross-entropy is computed at each token position: how much probability did the model assign to the token that actually appeared?

The total loss is averaged across all positions. Minimizing cross-entropy during training encourages the model to assign high probability to correct tokens. The information-theoretic interpretation: cross-entropy measures the expected number of bits needed to encode data from the true distribution using a code optimized for the predicted distribution.

When predictions perfectly match reality, cross-entropy equals entropy, the theoretical minimum encoding length. Cross-entropy is differentiable and pairs naturally with softmax outputs, making it computationally tractable for gradient descent. Nearly all modern language model training uses cross-entropy loss, often called 'language modeling loss' or 'next-token prediction loss' in that context.

Interactive Concept: cross entropy loss

Cross-Entropy Loss Visualizer

Explore how prediction confidence affects loss values in classification

Model Predictions

Cat ★0.700

Dog 0.200

Bird 0.100

True Label:

Quick Examples

0.357

Excellent! High confidence in correct answer

Calculation:

Loss = -log(p_true)

Loss = -log(0.700)

Loss = 0.357

Loss Visualization

Low Loss (0)Medium (1.5)High Loss (3+)

Key Insight:

The logarithmic penalty grows exponentially as confidence in wrong answers increases, creating strong gradients that guide learning.

Interactive Concept: cross entropy loss

Cross-Entropy Loss Visualizer

Explore how prediction confidence affects loss values in classification

Model Predictions

Cat ★0.700

Dog 0.200

Bird 0.100

True Label:

Quick Examples

Cross-Entropy Loss

0.357

Excellent! High confidence in correct answer

Calculation:

Loss = -log(p_true)

Loss = -log(0.700)

Loss = 0.357

Loss Visualization

Low Loss (0)Medium (1.5)High Loss (3+)

Key Insight:

The logarithmic penalty grows exponentially as confidence in wrong answers increases, creating strong gradients that guide learning.

Cross-Entropy Loss

Cross-Entropy Loss Visualizer

Model Predictions

True Label:

Quick Examples

Cross-Entropy Loss

Calculation:

Loss Visualization

Related Terms

Cross-Entropy Loss

Cross-Entropy Loss Visualizer

Model Predictions

True Label:

Quick Examples

Cross-Entropy Loss

Calculation:

Loss Visualization

Related Terms