Dropout | Glossary | Vedang Vatsa

Dropout is a regularization technique that randomly deactivates a fraction of neurons during each training step, forcing the network to learn redundant representations that generalize better. 5) of being temporarily removed along with all its connections. This prevents neurons from co-adapting, relying too heavily on specific other neurons that might not be present.

Each training iteration sees a different random subset of neurons, effectively training an ensemble of thinner networks that share weights. At inference time, all neurons are active but their outputs are scaled by (1-p) to maintain consistent magnitude. This approximates averaging predictions across the ensemble of possible dropout configurations. Dropout was introduced by Hinton et al.

In 2012 and became a standard component of deep learning. It's particularly effective for fully connected layers where co-adaptation is most problematic. Transformers use dropout after attention layers and within feedforward blocks. Modern architectures sometimes use DropPath (dropping entire residual connections) or attention dropout.

The dropout rate is a hyperparameter: too low provides insufficient regularization, too high destroys too much signal for learning. Dropout approximately doubles training time because the effective model capacity is reduced at each step, but the regularization benefit usually outweighs this cost.

Interactive Concept: dropout

Dropout Regularization

Watch how dropout randomly deactivates neurons during training to prevent overfitting and improve generalization

Neural Network

Active: 0/19 neurons

Dropout Rate: 30%

0%80%

Training Iteration:0

Active Neurons:0/19

Show Connections

Input/Output (always active)

Active hidden neuron

Dropped out neuron

Related Terms

Overfitting Fine-Tuning