Calendar26 September 2025

Publication: MAIA: a Benchmark for Multimodal AI Assessment Publication: MAIA: a Benchmark for Multimodal AI Assessment

In recent years, mainly following the success of large language models (LLMs), there has been a growing interest for large pre-trained models able to manage both texts and images, so-called Vision and Language models (VLMs). As a consequence of the fast and increasing power of VLMs, assessing their performance on standardized tasks and metrics is becoming increasingly challenging.

In this work, EMERGE partners from the University of Pisa introduce MAIA (Multimodal AI Assessment), a multimodal dataset developed as a core component of a competence-oriented benchmark designed for fine-grained investigation of the reasoning abilities of Visual Language Models (VLMs) on videos.

Read the paper in the link below.