Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a neural network that processes sequential data by maintaining a hidden state vector updated at each time step, allowing information from earlier inputs to influence later predictions.
In an RNN, the hidden state h_t is computed as h_t = f(W_h · h_{t-1} + W_x · x_t + b), where f is a nonlinear activation, W_h and W_x are learned weight matrices, and x_t is the current input. Because each step's output depends on the previous hidden state, the network theoretically has access to the full history of a sequence, making it a natural fit for language modeling, speech recognition, and time-series forecasting.
Vanilla RNNs suffer from vanishing and exploding gradients during backpropagation through time: gradients tend to shrink or explode exponentially as they propagate through many steps, making it hard to learn long-range dependencies. The Long Short-Term Memory (LSTM, Hochreiter & Schmidhuber, 1997) and Gated Recurrent Unit (GRU, Cho et al., 2014) architectures address this with learned gating mechanisms — forget, input, and output gates — that control what information is retained or discarded, enabling relevant context to persist across hundreds of steps.
RNNs were the dominant architecture for sequence modeling throughout the 2010s, underpinning key systems such as Google Translate (LSTM-based until 2017), speech recognition pipelines, and handwriting recognition. Their recurrent nature means inference is inherently sequential and cannot be fully parallelized across time, which limits training throughput compared to transformers.
By 2026, RNNs and LSTMs have been largely displaced by transformers and state space models for most NLP and speech tasks. They retain a role in resource-constrained or latency-sensitive applications — embedded devices, real-time audio processing, and control systems — where their constant memory footprint per inference step is an advantage over attention-based models whose memory grows with context length.