Text REtrieval Conference (TREC) 2025¶

Adhoc Video Search¶

Overview | Proceedings | Data | Runs | Participants

The Ad-hoc search task goal is to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, from 2019 to 2021 a new data collection (V3C1) based on Vimeo Creative Commons (V3C) datset was adopted. Starting in 2022 the task started to utilize a new sub-collection V3C2 to test systems on a new set of queries in addition to common (fixed) progress query set to measure system progress from 2022 to 2025.

Track coordinator(s):

George Awad, National Institute of Standards and Technology (NIST)

Track Web Page: https://www-nlpir.nist.gov/projects/tv2025/avs.html

BioGen¶

Overview | Proceedings | Data | Runs | Participants

With the advancement of large language models (LLMs), the biomedical domain has seen significant progress and improvement in multiple tasks such as biomedical question answering, lay language summarization of the biomedical literature, clinical note summation, etc. However, hallucinations or confabulations remain one of the key challenges when using LLMs in the biomedical domain. Inaccuracies may be particularly harmful in high-risk situations, such as making clinical decisions or appraising biomedical research. Towards this, in our pilot task organized at TREC 2024, we introduced the task of reference attribution as a means to mitigate the generation of false statements by LLMs toward answering the biomedical question. We propose to continue this task with an additional task of grounding the answer in the BioGen track at TREC 2025. The goal of the TREC 2025 BioGen task will be to cite references to support the text of the sentences and the overall answer from LLM output for each topic.

Track coordinator(s):

Deepak Gupta - National Library of Medicine, NIH
Dina Demner-Fushman - National Library of Medicine, NIH
Bill Hersh - Oregon Health & Science University
Steven Bedrick - Oregon Health & Science University
Kirk Roberts - UTHealth Houston

Track Web Page: https://trec-biogen.github.io/

Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)¶

Overview | Proceedings | Data | Runs | Participants

The TREC DRAGUN Track, the continuation of the Lateral Reading track, is for researchers interested in addressing the problems of misinformation and trust in search and online content. The current web landscape requires the ability to make judgments about the trustworthiness of information, which is a difficult task for most people. Meanwhile, automated detection of misinformation is likely to remain limited to well-defined domains or be limited to simple fact-checking.

Track coordinator(s):

Dake Zhang, University of Waterloo
Mark Smucker, University of Waterloo
Charles L. A. Clarke, University of Waterloo

Track Web Page: https://trec-dragun.github.io/

Interactive Knowledge Assistance Track (IKAT)¶

Overview | Proceedings | Data | Runs | Participants

iKAT is the successor to the Conversational Assistance Track (CAsT). The fourth year of CAST aimed to add more conversational elements to the interaction streams, by introducing mixed initiatives (clarifications, and suggestions) to create multi-path, multi-turn conversations for each topic. TREC iKAT evolves CAsT into a new track to signal this new trajectory. iKAT aims to focus on supporting multi-path, multi-turn, multi-perspective conversations. That is for a given topic, the direction and the conversation that evolves depends not only on the prior responses but also on the user.

Track coordinator(s):

Mohammad Aliannejadi, University of Amsterdam
Simon Lupart, University of Amsterdam
Marcel Gohsen, Bauhaus-Universität Weimar
Zahra Abbasiantaeb, University of Amsterdam
Nailia Mirzakhmedova, Bauhaus-Universität Weimar
Johannes Kiesel, GESIS - Leibniz Institute for the Social Sciences
Jeff Dalton, University of Edinburgh

Track Web Page: https://www.trecikat.com

Million LLMs Track (MLLM)¶

Overview | Proceedings | Data | Runs | Participants

The Million LLMs Track introduces a novel challenge: ranking large language models (LLMs) based on their expected ability to answer specific user queries. As organizations deploy ensembles of LLMs—ranging from general-purpose to domain-specific it becomes crucial to determine which models to consult for a given task. This track focuses on evaluating systems that can effectively identify the most capable LLM(s) for a query, without issuing new queries to the models.

Track coordinator(s):

Evangelos Kanoulas, University of Amsterdam
Panagiotis Eustratiadis, University of Amsterdam
Mark Sanderson, RMIT University
Jamie Callan, Carnegie Mellon University

Track Web Page: https://trec-mllm.github.io/

Product Search and Recommendation¶

Overview | Proceedings | Data | Runs | Participants

The TREC Product Search and Recommendations Track aims to advance research in product search and recommendation systems by creating robust, high-quality datasets that enable the evaluation of end-to-end multimodal retrieval and nuanced product recommendation algorithms.

Track coordinator(s):

Surya Kallumadi, Coursera
ChengXiang Zhai, University of Illinois Urbana-Champaign
Michael Ekstrand, Drexel University
Rikiya Takehi, Waseda University
Dean Alvarez, University of Illinois Urbana-Champaign
Daniel Campos, Snowflake
Alessandro Magnini, WalmartLabs

Track Web Page: https://trec-product-search.github.io/

Retrieval Augmented Generation (RAG)¶

Overview | Proceedings | Data | Runs | Participants

The (TREC) Retrieval-Augmented Generation Track is intended to foster innovation and research within the field of retrieval-augmented generation systems. This area of research focuses on combining retrieval methods - techniques for finding relevant information within large corpora with Large Language Models (LLMs) to enhance the ability of systems to produce relevant, accurate, updated and contextually appropriate content.

Track coordinator(s):

Shivani Upadhyay, University of Waterloo
Ronak Pradeep, University of Waterloo
Nandan Thakur, University of Waterloo
Jimmy Lin, University of Waterloo
Nick Craswell, Microsoft

Track Web Page: https://trec-rag.github.io/

RAG TREC Instrument for Multilingual Evaluation (RAGTIME)¶

Overview | Proceedings | Data | Runs | Participants

TREC RAGTIME is a TREC shared task to study and benchmark report generation from news (both English and multi-lingual). Key features of the track are its focus on multi-faceted reports (going beyond factoid QA), and a citation-based evaluation (providing supporting evidence of claims made in the report). It also benchmarks Cross-Language (CLIR) and Multi-lingual (MLIR) retrieval as supporting subtasks.

Track coordinator(s):

Dawn Lawrie, Johns Hopkins University, HLTCOE
Sean MacAvaney, University of Glasgow
James Mayfield, Johns Hopkins University, HLTCOE
Luca Soldaini, Allen Institute for AI
Eugene Yang, Johns Hopkins University, HLTCOE
Andrew Yates, Johns Hopkins University, HLTCOE

Track Web Page: https://trec-ragtime.github.io/

Tip of the Tongue (TOT)¶

Overview | Proceedings | Data | Runs | Participants

Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” [1] (i.e., “It’s on the tip of my tongue…”). Current IR systems are not well-equipped to address ToT information needs. As evidence, a wide range of community Q&A sites have emerged to help people resolve their ToT information needs with the help of other people.

Track coordinator(s):

Jaime Arguello, University of North Carolina
Fernando Diaz, Carnegie Mellon University and Google
Maik Fröbe, Friedrich-Schiller-Universität Jena
To Eun Kim, Carnegie Mellon University
Bhaskar Mitra

Track Web Page: https://trec-tot.github.io/

Video Question Answering (VQA)¶

Overview | Proceedings | Data | Runs | Participants

The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning. The challenge track will serve as a critical evaluation framework to measure progress in video understanding, helping identify strengths and weaknesses in current AI architectures. By fostering innovation in multimodal learning, this track will contribute to advancing AI’s ability to process dynamic visual narratives, enabling more reliable and human-like interaction with video-based information.

Track coordinator(s):

George Awad, National Institute of Standards and Technology (NIST)
Sanjay Purushotham, UMBC

Track Web Page: https://www-nlpir.nist.gov/projects/tv2026/vqa.html