Skip to content

Overview - Video Question Answering (VQA) 2025

Proceedings | Data | Runs | Participants

The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning. The challenge track will serve as a critical evaluation framework to measure progress in video understanding, helping identify strengths and weaknesses in current AI architectures. By fostering innovation in multimodal learning, this track will contribute to advancing AI’s ability to process dynamic visual narratives, enabling more reliable and human-like interaction with video-based information.

Track coordinator(s):

  • George Awad, National Institute of Standards and Technology (NIST)
  • Sanjay Purushotham, UMBC

Tasks:

  • trec2025-vqa-gen: Answer Generation
  • trec2025-vqa-mc: Multiple Choice

Track Web Page: https://www-nlpir.nist.gov/projects/tv2026/vqa.html