Continual Learning
Continual learning is a machine learning paradigm in which a model learns from a continuous stream of tasks or data over time while retaining performance on previously acquired knowledge, without full retraining from scratch.
Continual learning (also called lifelong learning or incremental learning) addresses one of the most significant gaps between artificial and biological intelligence: the ability to accumulate knowledge progressively without catastrophically forgetting what was learned before. In standard deep learning, models are trained once on a fixed dataset and then deployed statically. Continual learning replaces this with an ongoing process where new tasks, classes, or data distributions arrive sequentially and the model must integrate them without simultaneous access to all prior training data.
The field distinguishes several problem settings. Task-incremental learning assumes the model knows which task it is performing at inference time. Class-incremental learning requires classification among all previously seen classes without task identity cues—a substantially harder problem. Domain-incremental learning presents the same task type but with shifting input distributions, such as images captured under different conditions over time. Each setting imposes different constraints on how forgetting manifests and how it should be measured.
Core techniques fall into three families. Regularization-based approaches, such as elastic weight consolidation (EWC) and synaptic intelligence, identify which weights matter for previous tasks and constrain their updates during new training. Rehearsal-based methods maintain a small episodic memory of past examples—or use a generative model to synthesize pseudo-examples—and interleave them with new data. Architecture-based approaches allocate separate or expanding network capacity per task, protecting old knowledge through isolation. Large pretrained foundation models have shifted the practical landscape: fine-tuning them with parameter-efficient methods such as LoRA or prefix tuning causes substantially less forgetting than training task-specific models from scratch, making these techniques directly relevant to continual learning practice.
As of 2026, continual learning is increasingly important for deploying language models that must adapt to new information—recent events, updated facts, user-specific knowledge—without full retraining cycles. Retrieval-augmented generation (RAG) offers a complementary strategy by externalizing new knowledge to a searchable database rather than encoding it in weights. Benchmark suites including Continual World and CLEVA measure progress, though the field still lacks a single agreed-upon evaluation standard.