Calendar01 July 2024

Publication: Continual pre-training mitigates forgetting in language and vision Publication: Continual pre-training mitigates forgetting in language and vision

Continual Learning (CL) focuses on the design of agents able to learn from a stream of non-stationary data while preserving previously acquired knowledge. The tendency of neural networks to catastrophically forget when confronted with new data has been the subject of many studies, mostly focused on the design of new CL strategies that mitigate such a problem.

Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning.

In this work, EMERGE partners from the University of Pisa investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. The authors introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream.

Moreover, they propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre-training phase. They show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, they discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial.

Read the paper in the link below.