Overview - Video Question Answering (VQA) 2025¶
Proceedings | Data | Runs | Participants
The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning. The challenge track will serve as a critical evaluation framework to measure progress in video understanding, helping identify strengths and weaknesses in current AI architectures. By fostering innovation in multimodal learning, this track will contribute to advancing AI’s ability to process dynamic visual narratives, enabling more reliable and human-like interaction with video-based information.
Track coordinator(s):
- George Awad, National Institute of Standards and Technology (NIST)
- Sanjay Purushotham, UMBC
Tasks:
trec2025-vqa-gen: Answer Generationtrec2025-vqa-mc: Multiple Choice
Track Web Page: https://www-nlpir.nist.gov/projects/tv2026/vqa.html