Training

Backpropagation

Backpropagation is an algorithm for computing the gradient of a neural network's loss with respect to its weights by propagating error signals backward through the network layers, enabling gradient-based optimization.

Backpropagation (short for "backward propagation of errors") is the core algorithm used to train artificial neural networks. It computes how much each weight contributed to the output error by applying the chain rule of calculus iteratively from the output layer back to the input layer, producing a gradient vector that guides weight updates.

During a forward pass, input data flows through the network, producing a prediction. The loss function then measures the discrepancy between that prediction and the target. Backpropagation performs the reverse pass: starting at the output, it calculates partial derivatives of the loss with respect to each weight, layer by layer. These gradients are then used by an optimizer—such as stochastic gradient descent (SGD) or Adam—to adjust weights in the direction that reduces the loss.

The algorithm was popularized for neural networks by Rumelhart, Hinton, and Williams in their 1986 paper in Nature, though earlier independent derivations exist. It remains the foundational training mechanism for virtually all deep learning systems, from small classifiers to large language models with hundreds of billions of parameters.

As of 2026, backpropagation continues to underpin the training of frontier models such as GPT-4, Gemini, and LLaMA 3. Research into alternatives—including forward-mode differentiation, synthetic gradients, and biologically inspired local learning rules—remains active but has not displaced backpropagation in large-scale practical training.

Example

When training an image classifier, backpropagation computes how much each convolutional filter's weights contributed to misclassifying a cat as a dog, allowing those weights to be adjusted to reduce the error on the next training iteration.

Related terms

← Glossary