A recurrent neural network processes sequential data by maintaining a hidden state that carries information across time steps, enabling the network to model temporal dependencies. At each time step, the RNN reads the current input and the previous hidden state, producing an output and an updated hidden state. This recurrent connection creates a form of memory, information from early in the sequence can influence processing of later elements. The same weights are used at every time step, making RNNs parameter-efficient for sequence modeling. Basic RNNs suffer from vanishing and exploding gradients when processing long sequences: gradients multiplied across many time steps either shrink to near-zero or grow explosively, making it impossible to learn long-range dependencies. LSTM and GRU architectures solved this with gating mechanisms that control information flow, allowing important signals to persist across hundreds of time steps. RNNs process sequences inherently sequentially, you cannot compute step N until step N-1 finishes. This limits parallelization and training speed. Transformers replaced this sequential bottleneck with attention mechanisms that process all positions simultaneously. Although transformers have largely superseded RNNs for language tasks, RNNs remain useful for applications requiring online processing, low memory footprint, or explicit sequential structure like certain time series and control systems.
Back to Glossary