Overview - Medical Video Question Answering 2024ΒΆ
Proceedings
| Data
| Runs
| Participants
The recent surge in the availability of online videos has changed the way of acquiring information and knowledge. Many people prefer instructional videos to teach or learn how to accomplish a particular task in an effective and efficient manner with a series of step-by-step procedures. Similarly, medical instructional videos are more suitable and beneficial for delivering key information through visual and verbal communication to consumers' healthcare questions that demand instruction. We aim to extract the visual information from the video corpus for consumers' first aid, medical emergency, and medical educational questions. Extracting the relevant information from the video corpus requires relevant video retrieval, moment localization, video summarization, and captioning skills. Toward this, the TREC task, Medical Video Question Answering, focuses on developing systems capable of understanding medical videos and providing visual answers (from single and multiple videos) and instructional step captions to answer natural language questions. Emphasizing the importance of multimodal capabilities, the task requires systems to generate instructional questions and captions based on medical video content. Following the MedVidQA 2023, TREC 2024 expanded the tasks considering language-video understanding and generation. This track is comprised of two main tasks: Video Corpus Visual Answer Localization (VCVAL) and Query-Focused Instructional Step Captioning (QFISC).
Track coordinator(s):
- Deepak Gupta, National Library of Medicine, NIH
- Dina Demner-Fushman, National Library of Medicine, NIH
Tasks:
vcval
: Video Corpus Visual Answer Localizationqfisc
: Query-Focused Instructional Step Captioning