Skip to content

Text REtrieval Conference (TREC) 2025

Overview | Proceedings | Data | Runs | Participants

The Ad-hoc search task goal is to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, from 2019 to 2021 a new data collection (V3C1) based on Vimeo Creative Commons (V3C) datset was adopted. Starting in 2022 the task started to utilize a new sub-collection V3C2 to test systems on a new set of queries in addition to common (fixed) progress query set to measure system progress from 2022 to 2025.

Track coordinator(s):

  • George Awad, National Institute of Standards and Technology (NIST)

Track Web Page: https://www-nlpir.nist.gov/projects/tv2025/avs.html


BioGen

Overview | Proceedings | Data | Runs | Participants

With the advancement of large language models (LLMs), the biomedical domain has seen significant progress and improvement in multiple tasks such as biomedical question answering, lay language summarization of the biomedical literature, clinical note summation, etc. However, hallucinations or confabulations remain one of the key challenges when using LLMs in the biomedical domain. Inaccuracies may be particularly harmful in high-risk situations, such as making clinical decisions or appraising biomedical research. Towards this, in our pilot task organized at TREC 2024, we introduced the task of reference attribution as a means to mitigate the generation of false statements by LLMs toward answering the biomedical question. We propose to continue this task with an additional task of grounding the answer in the BioGen track at TREC 2025. The goal of the TREC 2025 BioGen task will be to cite references to support the text of the sentences and the overall answer from LLM output for each topic.

Track coordinator(s):

  • Deepak Gupta - National Library of Medicine, NIH
  • Dina Demner-Fushman - National Library of Medicine, NIH
  • Bill Hersh - Oregon Health & Science University
  • Steven Bedrick - Oregon Health & Science University
  • Kirk Roberts - UTHealth Houston

Track Web Page: https://trec-biogen.github.io/


Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)

Overview | Proceedings | Data | Runs | Participants

The TREC DRAGUN Track, the continuation of the Lateral Reading track, is for researchers interested in addressing the problems of misinformation and trust in search and online content. The current web landscape requires the ability to make judgments about the trustworthiness of information, which is a difficult task for most people. Meanwhile, automated detection of misinformation is likely to remain limited to well-defined domains or be limited to simple fact-checking.

Track coordinator(s):

  • Dake Zhang, University of Waterloo
  • Mark Smucker, University of Waterloo
  • Charles L. A. Clarke, University of Waterloo

Track Web Page: https://trec-dragun.github.io/


Interactive Knowledge Assistance Track (IKAT)

Overview | Proceedings | Data | Runs | Participants

iKAT is the successor to the Conversational Assistance Track (CAsT). The fourth year of CAST aimed to add more conversational elements to the interaction streams, by introducing mixed initiatives (clarifications, and suggestions) to create multi-path, multi-turn conversations for each topic. TREC iKAT evolves CAsT into a new track to signal this new trajectory. iKAT aims to focus on supporting multi-path, multi-turn, multi-perspective conversations. That is for a given topic, the direction and the conversation that evolves depends not only on the prior responses but also on the user.

Track coordinator(s):

  • Mohammad Aliannejadi, University of Amsterdam
  • Simon Lupart, University of Amsterdam
  • Marcel Gohsen, Bauhaus-Universität Weimar
  • Zahra Abbasiantaeb, University of Amsterdam
  • Nailia Mirzakhmedova, Bauhaus-Universität Weimar
  • Johannes Kiesel, GESIS - Leibniz Institute for the Social Sciences
  • Jeff Dalton, University of Edinburgh

Track Web Page: https://www.trecikat.com


Million LLMs Track (MLLM)

Overview | Proceedings | Data | Runs | Participants

The Million LLMs Track introduces a novel challenge: ranking large language models (LLMs) based on their expected ability to answer specific user queries. As organizations deploy ensembles of LLMs—ranging from general-purpose to domain-specific it becomes crucial to determine which models to consult for a given task. This track focuses on evaluating systems that can effectively identify the most capable LLM(s) for a query, without issuing new queries to the models.

Track coordinator(s):

  • Evangelos Kanoulas, University of Amsterdam
  • Panagiotis Eustratiadis, University of Amsterdam
  • Mark Sanderson, RMIT University
  • Jamie Callan, Carnegie Mellon University

Track Web Page: https://trec-mllm.github.io/


Product Search and Recommendation

Overview | Proceedings | Data | Runs | Participants

The TREC Product Search and Recommendations Track aims to advance research in product search and recommendation systems by creating robust, high-quality datasets that enable the evaluation of end-to-end multimodal retrieval and nuanced product recommendation algorithms.

Track coordinator(s):

  • Surya Kallumadi, Coursera
  • ChengXiang Zhai, University of Illinois Urbana-Champaign
  • Michael Ekstrand, Drexel University
  • Rikiya Takehi, Waseda University
  • Dean Alvarez, University of Illinois Urbana-Champaign
  • Daniel Campos, Snowflake
  • Alessandro Magnini, WalmartLabs

Track Web Page: https://trec-product-search.github.io/


Retrieval Augmented Generation (RAG)

Overview | Proceedings | Data | Runs | Participants

The (TREC) Retrieval-Augmented Generation Track is intended to foster innovation and research within the field of retrieval-augmented generation systems. This area of research focuses on combining retrieval methods - techniques for finding relevant information within large corpora with Large Language Models (LLMs) to enhance the ability of systems to produce relevant, accurate, updated and contextually appropriate content.

Track coordinator(s):

  • Shivani Upadhyay, University of Waterloo
  • Ronak Pradeep, University of Waterloo
  • Nandan Thakur, University of Waterloo
  • Jimmy Lin, University of Waterloo
  • Nick Craswell, Microsoft

Track Web Page: https://trec-rag.github.io/


RAG TREC Instrument for Multilingual Evaluation (RAGTIME)

Overview | Proceedings | Data | Runs | Participants

TREC RAGTIME is a TREC shared task to study and benchmark report generation from news (both English and multi-lingual). Key features of the track are its focus on multi-faceted reports (going beyond factoid QA), and a citation-based evaluation (providing supporting evidence of claims made in the report). It also benchmarks Cross-Language (CLIR) and Multi-lingual (MLIR) retrieval as supporting subtasks.

Track coordinator(s):

  • Dawn Lawrie, Johns Hopkins University, HLTCOE
  • Sean MacAvaney, University of Glasgow
  • James Mayfield, Johns Hopkins University, HLTCOE
  • Luca Soldaini, Allen Institute for AI
  • Eugene Yang, Johns Hopkins University, HLTCOE
  • Andrew Yates, Johns Hopkins University, HLTCOE

Track Web Page: https://trec-ragtime.github.io/


Tip of the Tongue (TOT)

Overview | Proceedings | Data | Runs | Participants

Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” [1] (i.e., “It’s on the tip of my tongue…”). Current IR systems are not well-equipped to address ToT information needs. As evidence, a wide range of community Q&A sites have emerged to help people resolve their ToT information needs with the help of other people.

Track coordinator(s):

  • Jaime Arguello, University of North Carolina
  • Fernando Diaz, Carnegie Mellon University and Google
  • Maik Fröbe, Friedrich-Schiller-Universität Jena
  • To Eun Kim, Carnegie Mellon University
  • Bhaskar Mitra

Track Web Page: https://trec-tot.github.io/


Video Question Answering (VQA)

Overview | Proceedings | Data | Runs | Participants

The Video Question Answering (VQA) Challenge aims to rigorously assess the capabilities of state-of-the-art multimodal models in understanding and reasoning about video content. Participants in this challenge will develop and test models that answer a diverse set of questions based on video segments, covering various levels of complexity, from factual retrieval to complex reasoning. The challenge track will serve as a critical evaluation framework to measure progress in video understanding, helping identify strengths and weaknesses in current AI architectures. By fostering innovation in multimodal learning, this track will contribute to advancing AI’s ability to process dynamic visual narratives, enabling more reliable and human-like interaction with video-based information.

Track coordinator(s):

  • George Awad, National Institute of Standards and Technology (NIST)
  • Sanjay Purushotham, UMBC

Track Web Page: https://www-nlpir.nist.gov/projects/tv2026/vqa.html