28 March 2026
Large Language Models (LLMs) struggle to keep up with the fast-changing nature of real-world information, as their pre-trained knowledge quickly becomes outdated. In this study, EMERGE partners from the University of Pisa address the challenge of keeping LLMs up to date with factual knowledge (adaptation) while avoiding forgetting the relevant existing knowledge. Leveraging temporally-aligned Wikipedia and Wikidata dumps, the authors extract a continuous data stream and evaluate the performance of an incrementally trained GPT-2 across different time periods.
Additionally, they extend the analysis to real-world news data using the RealTimeData dataset, examining how LLMs respond to novel facts, such as the COVID-19 pandemic. Their methodology includes synthetic data generation and SmartReview, a continual learning strategy that avoids forgetting by rehearsing on a carefully selected subset of the old data. Experimental results highlight that pretrained models require continual learning and demonstrate the effectiveness of replay-based approaches in mitigating forgetting. In particular, SmartReview provides a strong replay-based baseline that limits forgetting and enhances adaptation. This work advances the study of continual learning in LLMs, offering insights into the development of more temporally-aware and reliable AI systems.
Read the paper in the link below.

