Publication: MAIA: a Benchmark for Multimodal AI Assessment

26 September 2025

In recent years, mainly following the success of large language models (LLMs), there has been a growing interest for large pre-trained models able to manage both texts and images, so-called Vision and Language models (VLMs). As a consequence of the fast and increasing power of VLMs, assessing their performance on standardized tasks and metrics is becoming increasingly challenging.

In this work, EMERGE partners from the University of Pisa introduce MAIA (Multimodal AI Assessment), a multimodal dataset developed as a core component of a competence-oriented benchmark designed for fine-grained investigation of the reasoning abilities of Visual Language Models (VLMs) on videos.

Read the paper in the link below.

More Information

Next Article
Publication: Sparse Autoencoders Find Partially Interpretable Features in Italian Small Language Models
Previous Article
Publication: LLMs Struggle on Explicit Causality in Italian
View All Articles

Publication: MAIA: a Benchmark for Multimodal AI Assessment

About

Consortium

Resources

Outreach

Follow Us

About

Consortium

Resources

News & Events

Follow Us