Backpropagation is the algorithm that makes deep learning possible by efficiently computing how each weight in a neural network should change to reduce prediction error. During a forward pass, input data flows through the network producing an output. The loss function measures how wrong that output is. Backpropagation then computes the gradient of this loss with respect to every weight in the network by propagating error signals backward from the output layer to the input layer using the chain rule of calculus. At each layer, the algorithm computes how much the layer's output contributed to the final error, then how much each weight contributed to that layer's output. These gradients tell you the direction to adjust each weight to reduce the loss. The key insight is that you can compute all gradients in a single backward pass, reusing intermediate computations, the gradient at layer N depends on the gradient at layer N+1, which you just computed. This makes training networks with millions of parameters feasible. Before backpropagation (popularized by Rumelhart, Hinton, and Williams in 1986), training neural networks required computing gradients numerically, which was orders of magnitude slower. Backpropagation combined with gradient descent forms the foundation of all modern deep learning. Automatic differentiation frameworks like PyTorch and TensorFlow implement backpropagation automatically.
Back to Glossary