LSTM | Glossary | Vedang Vatsa

Long Short-Term Memory is a recurrent neural network architecture designed to capture long-range dependencies in sequential data by using gating mechanisms that control information flow through a persistent cell state. Standard RNNs struggle with sequences longer than a few dozen steps because gradients either vanish or explode during backpropagation through time.

LSTMs solve this with three gates: the forget gate decides what to discard from the cell state, the input gate decides what new information to store, and the output gate decides what to emit from the cell state. The cell state acts as a highway for information, protected by gates from the multiplicative gradient decay that plagues vanilla RNNs.

Important information can persist across hundreds of time steps. LSTMs also include a hidden state that captures short-term context. Introduced by Hochreiter and Schmidhuber in 1997, LSTMs enabled practical sequence modeling for machine translation, speech recognition, and text generation before transformers emerged.

The Gated Recurrent Unit simplifies the LSTM with only two gates while achieving comparable performance. Although transformers have largely replaced LSTMs for natural language processing, LSTMs remain relevant for time series forecasting, online sequence modeling, and applications where the autoregressive structure of RNNs is advantageous.

Understanding LSTMs provides insight into the vanishing gradient problem and the importance of information highways in deep networks.

Interactive Concept: lstm

LSTM Cell Interactive

Explore how gates control information flow through an LSTM cell

Input Sequence

0.8

-0.3

0.6

-0.9

0.4

Gate Weights

Forget Gate Weight: 0.60

Input Gate Weight: 0.70

Output Gate Weight: 0.80

LSTM Cell State

Input: 0.80

Forget Gate

0.500

Input Gate

0.500

Output Gate

0.500

Cell State (C_t)

0.000

Long-term memory

Hidden State (h_t)

0.000

Output & short-term

Candidate Value

0.000

Step 1 of 5

Adjust the gate weights and input sequence to see how LSTM cells selectively forget, remember, and output information. The forget gate controls what to discard from cell state, input gate decides what new information to store, and output gate determines what parts of the cell state to output.

Related Terms

Recurrent Neural Network Transformer

Understanding LSTMs provides insight into the vanishing gradient problem and the importance of information highways in deep networks.

Interactive Concept: lstm

LSTM Cell Interactive

Explore how gates control information flow through an LSTM cell

Input Sequence

0.8

-0.3

0.6

-0.9

0.4

Gate Weights

Forget Gate Weight: 0.60

Input Gate Weight: 0.70

Output Gate Weight: 0.80

LSTM Cell State

Input: 0.80

Forget Gate

0.500

Input Gate

0.500

Output Gate

0.500

Cell State (C_t)

0.000

Long-term memory

Hidden State (h_t)

0.000

Output & short-term

Candidate Value

0.000

Step 1 of 5

Related Terms

Recurrent Neural Network Transformer