Calendar19 July 2025

Publication: Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments Publication: Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments

Continually learning from non-independent and identically distributed data poses a significant challenge in deep learning, particularly in resource-constrained environments. Visual models trained via supervised learning often suffer from overfitting, catastrophic forgetting, and biased representations when faced with sequential tasks. In contrast, pre-trained language models demonstrate greater robustness in managing task sequences due to their generalized knowledge representations, albeit at the cost of high computational resources.

Leveraging this advantage, EMERGE partners from the University of Pisa propose in this work a novel learning strategy, Continual Visual Mapping (CVM), which continuously maps visual representations into a fixed knowledge space derived from a language model. By anchoring learning to this fixed space, CVM enables training small, efficient visual models, making it particularly suited for scenarios where adapting large pre-trained visual models is computationally or data-prohibitive. Empirical evaluations across five benchmarks demonstrate that CVM consistently outperforms state-of-the-art continual learning methods, showcasing its potential to enhance generalization and mitigate challenges in resource-constrained continual learning settings.

Read the paper in the link below.