Training

Epoch

An epoch is one complete pass of the entire training dataset through a machine learning model. Models are typically trained for multiple epochs, each pass refining the model's weights through backpropagation applied to every training sample.

In machine learning, an epoch is a single full iteration over the complete training dataset. During one epoch, every sample in the training set is presented to the model exactly once — typically in shuffled mini-batches — and the model's parameters are updated via backpropagation after each batch. Epoch count is a primary hyperparameter controlling total training duration.

Within each epoch, the dataset is commonly shuffled and divided into mini-batches (commonly 32–4,096 samples depending on the task and hardware). The model computes a forward pass to generate predictions, calculates the loss against ground truth, and uses backpropagation to compute gradients that an optimizer (such as Adam or SGD) applies to update the weights. After each epoch, practitioners typically evaluate the model on a held-out validation set to monitor generalization performance.

Epoch count directly affects the bias-variance tradeoff: too few epochs leave the model underfitted (persistently high loss on both training and validation data); too many can cause overfitting (validation loss rises while training loss continues to fall). Techniques such as early stopping — halting training when validation loss fails to improve for a set number of consecutive epochs — and learning rate scheduling (reducing the learning rate at epoch boundaries) are standard tools for managing this tradeoff.

For large language models pretrained on hundreds of billions to trillions of tokens, a single pass over the full corpus can take weeks to months on thousands of GPUs, and many training runs complete fewer than one full epoch over their dataset. Fine-tuning smaller models on task-specific datasets typically runs for 1–10 epochs. Frameworks such as PyTorch Lightning and the Hugging Face Trainer handle epoch bookkeeping automatically and log validation metrics after each epoch as a standard diagnostic.

Example

A sentiment classifier trained on 100,000 labeled product reviews for 10 epochs processes all 100,000 examples ten times in total; validation accuracy is checked after each epoch to determine whether further training improves or degrades generalization.

Related terms

Latest news on this topic

← Glossary