Long Short-Term Memory is a recurrent neural network architecture designed to capture long-range dependencies in sequential data by using gating mechanisms that control information flow through a persistent cell state. Standard RNNs struggle with sequences longer than a few dozen steps because gradients either vanish or explode during backpropagation through time. LSTMs solve this with three gates: the forget gate decides what to discard from the cell state, the input gate decides what new information to store, and the output gate decides what to emit from the cell state. The cell state acts as a highway for information, protected by gates from the multiplicative gradient decay that plagues vanilla RNNs. Important information can persist across hundreds of time steps. LSTMs also include a hidden state that captures short-term context. Introduced by Hochreiter and Schmidhuber in 1997, LSTMs enabled practical sequence modeling for machine translation, speech recognition, and text generation before transformers emerged. The Gated Recurrent Unit simplifies the LSTM with only two gates while achieving comparable performance. Although transformers have largely replaced LSTMs for natural language processing, LSTMs remain relevant for time series forecasting, online sequence modeling, and applications where the autoregressive structure of RNNs is advantageous. Understanding LSTMs provides insight into the vanishing gradient problem and the importance of information highways in deep networks.
Back to Glossary