Proceedings - Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN) 2025¶
Overview of the TREC 2025 DRAGUN Track: Detection, Retrieval, and Augmented Generation for Understanding News¶
Dake Zhang, Mark D. Smucker, Charles L. A. Clarke
Abstract
Many internet users struggle to assess whether online information is trustworthy, a critical skill in today’s digital environment where accurate content coexists with false or misleading material. As the successor to the previous TREC 2024 Lateral Reading Track, the TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track aims to advance research on supporting readers in assessing the trustworthiness of online news by providing reader-oriented, well-attributed reports. The track had two tasks: (1) Question Generation, which asked participants to propose critical, ranked questions a reader might investigate for a given article; and (2) Report Generation, which asked participants to produce a short (up to 250 words) background report grounded in the MS MARCO V2.1 Segmented Corpus. Using assessor-built rubrics with importance-weighted questions and short answers, we evaluated question coverage and report support/contradiction. We release topics, rubrics, annotations, runs, and evaluation results to support research on developing systems to help people assess the trustworthiness of news.
Bibtex
@inproceedings{coordinators-trec2025-papers-proc-6,
title = {Overview of the TREC 2025 DRAGUN Track: Detection, Retrieval, and Augmented Generation for Understanding News},
author = {Dake Zhang and Mark D. Smucker and Charles L. A. Clarke},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
An Iterative Multi-agent RAG System for the TREC 2025 DRAGUN Track¶
Dake Zhang
- Participant: UWaterlooMDS
- Paper: https://trec.nist.gov/pubs/trec34/papers/UWaterlooMDS.dragun.pdf
Abstract
The main goal of the TREC 2025 DRAGUN Track is to advance research on helping people assess the trustworthiness of online news articles via two tasks: question generation (producing critical questions that readers should consider when evaluating a news article’s trustworthiness) and report generation (creating well-sourced reports that provide readers with useful background and context for more informed trustworthiness evaluation). In this paper, we describe our organizer baselines for both tasks, including a starter kit made available to participants at the track’s launch. This multi-agent system uses an iterative retrieval-augmented generation pipeline consisting of a query generator, segment retriever, information evaluator, question generator, and report generator. The system is available at: https://github.com/trec-dragun/2025-starter-kit.
Bibtex
@inproceedings{UWaterlooMDS-trec2025-papers-proc-1,
title = {An Iterative Multi-agent RAG System for the TREC 2025 DRAGUN Track},
author = {Dake Zhang},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
TREMA-UNH at TREC 2025 DRAGUN Track: Iterative Multi-Agent Pipeline for News Verification via Adversarial Credibility Analysis with Local LLMs¶
Naghmeh Farzi, Laura Dietz
- Participant: TREMA-UNH
- Paper: https://trec.nist.gov/pubs/trec34/papers/TREMA-UNH.dragun.pdf
- Runs: SK_MI_1 | SK_MI_2 | SK_Critique_MI_5 | SK_Critique_MI_5_RG | SK_MI_2_RG | SK_ConvinceF_MI_2 | SK_ConvinceF_MI_2_RG | ConvF_all-t12_5 | ConvF_all-t12_5_RG | ConvF_all_MI_5 | ConvF_all_MI_5_RG
Abstract
This notebook describes our submission to the TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track. We adapt the official starter kit to use local large language models via Ollama, implement an adversarial module that produces both balanced and aggressive critiques of news articles, highlighting potential weaknesses, unsupported claims, contradictions, and source biases to inform and guide subsequent query generation and evidence retrieval. Our system generates investigative questions (Task 1) and, in consequence, a report (Task 2) through an iterative retrieval-augmented generation approach.
Bibtex
@inproceedings{TREMA-UNH-trec2025-papers-proc-1,
title = {TREMA-UNH at TREC 2025 DRAGUN Track: Iterative Multi-Agent Pipeline for News Verification via Adversarial Credibility Analysis with Local LLMs},
author = {Naghmeh Farzi and Laura Dietz},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
HLTCOE Evaluation Team at TREC 2025: RAG, RAGTIME, DRAGUN, and BioGen¶
Laura Dietz, Bryan Li, James Mayfield, Dawn Lawrie, Eugene Yang, William Walden
- Participant: HLTCOE
- Paper: https://trec.nist.gov/pubs/trec34/papers/HLTCOE.dragun.rag.ragtime.pdf
- Runs: cru-claude-chatty | cru-most_common | cru-claude | cru-ablR_ | cru-ablR-conf_ | cru-confirm-ansR_ | cru-clod-ablR-conf_ | cru-cloch-ablR-conf_
Abstract
The HLTCOE Evaluation team participated in several tracks focused on Retrieval-Augmented Generation (RAG), including RAG, RAGTIME, DRAGUN, and BioGen. Drawing inspiration from recent work on nugget-based evaluations, we introduce the Crucible system, which scrambles the traditional retrieval → generation → evaluation workflow of a RAG task by automatically curating a set of high-quality question-answer pairs (nuggets) from retrieved documents and then conditioning generation on this set. This not only enables us to study how effectively we can recover the set of gold nuggets for each request but additionally how nugget set quality impacts final performance.
Bibtex
@inproceedings{HLTCOE-trec2025-papers-proc-2,
title = {HLTCOE Evaluation Team at TREC 2025: RAG, RAGTIME, DRAGUN, and BioGen},
author = {Laura Dietz and Bryan Li and James Mayfield and Dawn Lawrie and Eugene Yang and William Walden},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
Intelligent News Comprehension through Query Expansion and LLM-Augmented Generation¶
Jack Cheverton, Oliwia Majtyka, Ting Liu
- Participant: SCIAI
- Paper: https://trec.nist.gov/pubs/trec34/papers/SCIAI.dragun.pdf
- Runs: Team02_Run01_1000SegmentsExpansion | Team02_Run02_100SegmentsExpansion | Team02_Run03_100SegmentsNoExpansion | Team01_Run01_Winner | Team02_Task1 | 03_01_Baseline | SCIAI_03_02_Three | SCIAI_03_03_Five | SCIAI_03_04_Eight
Abstract
This paper discusses our work and participation in the Text Retrieval Conference (TREC) Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN) track of 2025. In our current digital landscape, it can be difficult to determine the accuracy of what we see online. This is especially true with the rise in fake news. The DRAGUN track challenges participants to develop a system that analyzes an article and generates a list of questions a thoughtful reader should ask if they are trying to determine trustworthiness, as well as generate a report that answers many of these questions. This paper discusses our team’s use of query expansion techniques and Large Language Models to approach this task.
Bibtex
@inproceedings{SCIAI-trec2025-papers-proc-2,
title = {Intelligent News Comprehension through Query Expansion and LLM-Augmented Generation},
author = {Jack Cheverton and Oliwia Majtyka and Ting Liu},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track¶
Ignacy Alwasiak, Kene Nnolim, Jaclyn Thi, Samy Ateia, Markus Bink, Gregor Donabauer, David Elsweiler, Udo Kruschwitz
- Participant: UR_trecking
- Paper: https://trec.nist.gov/pubs/trec34/papers/UR_trecking.dragun.pdf
- Runs: UR_IW_run_1 | UR_IW_run_1_task2
Abstract
The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Task 1 (critical question generation) and Task 2 (retrieval augmented trustworthiness reporting). Our approach combines LLM-based question generation with semantic filtering, diversity enforcement using clustering, and several query expansion strategies (including reasoning-based Chain-of-Thought expansion) to retrieve relevant evidence from the MS MARCO V2.1 segmented corpus. Retrieved documents are re-ranked using a monoT5 model and filtered using an LLM relevance judge together with a domain-level trustworthiness dataset. For Task 2, selected evidence is synthesized by an LLM into concise trustworthiness reports with citations. Results from the official evaluation indicate that Chain-of-Thought query expansion and re-ranking substantially improve both relevance and domain trust compared to baseline retrieval, while question-generation performance shows moderate quality with room for improvement. We conclude by outlining key challenges encountered and suggesting directions for enhancing robustness and trustworthiness assessment in future iterations of the system.
Bibtex
@inproceedings{UR_trecking-trec2025-papers-proc-1,
title = {From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track},
author = {Ignacy Alwasiak and Kene Nnolim and Jaclyn Thi and Samy Ateia and Markus Bink and Gregor Donabauer and David Elsweiler and Udo Kruschwitz},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
CITADEL — Citation-Driven Draft-Evaluate Loop¶
Daniel Seredensky, Dylan Iddings, Sharon G. Small
- Participant: SCIAI
- Paper: https://trec.nist.gov/pubs/trec34/papers/SCIAI.dragun.pdf
- Runs: Team02_Run01_1000SegmentsExpansion | Team02_Run02_100SegmentsExpansion | Team02_Run03_100SegmentsNoExpansion | Team01_Run01_Winner | Team02_Task1 | 03_01_Baseline | SCIAI_03_02_Three | SCIAI_03_03_Five | SCIAI_03_04_Eight
Abstract
This paper describes our team submission from the Siena University Institute for Artificial Intelligence for the Text Retrieval Conference (TREC) 2025 Detection, Retrieval, and Augmented Generation for Understanding News Track (DRAGUN). Our approach combines classical retrieval methods with the usage of multiple large language model (LLM) agents to generate concise, evidence-based reports. First, a hybrid retrieval pipeline integrates BM25, synonym-based query expansion, and cross-encoder reranking to maximize recall and precision across the corpus. Then, retrieved passages are processed through a generation-evaluation loop, where different LLM agents separately generate and critique reports according to rubric-based criteria for coverage, accuracy, and citation quality. This design emphasizes factual accuracy and access to citations to align with DRAGUN’s goal of supporting critical engagement with news.
Bibtex
@inproceedings{SCIAI-trec2025-papers-proc-1,
title = {CITADEL — Citation-Driven Draft-Evaluate Loop},
author = {Daniel Seredensky and Dylan Iddings and Sharon G. Small},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
A LangChain-Based Framework for Investigative Question Generation Using Large Language Models¶
Adnan Faisal, Shiti Chowdhury
- Participant: CUET
- Paper: https://trec.nist.gov/pubs/trec34/papers/CUET.dragun.pdf
- Runs: CUET-DeepSeek-R1-Qwen-32B | CUET-qwen4B-v2 | CUET-unsloth-Mistral-Small | CUET-qwen4B-v3 | CUET-qwen14B-v1 | CUET-qwen14B-v2 | CUET-qwen14B-v3 | CUET-qwen14B-v5 | CUET-Mistral-Small-24B | CUET-QwQ-32B
Abstract
The increasing prevalence of online misinformation has amplified the demand for automated approaches that assist readers in assessing the credibility of news articles. The TREC 2025 DRAGUN (Detection, Retrieval and Augmented Generation for Understanding News) Track addresses this need through its Question Generation task, which requires systems to formulate ranked investigative questions that support reader-oriented credibility assessment. This study presents a LangChain-based pipeline for generating focused and investigative questions from news articles in the MS MARCO V2.1 segmented corpus. The proposed framework combines structured prompt design, controlled decoding and semantic reranking to improve question relevance, coherence and interpretability. We have evaluated several experimental configurations covering Qwen-based, Mistral-based and reasoning-oriented large language models using the rectified DRAGUN evaluation protocol, where compound questions are removed prior to scoring. Our experimental results indicate that reasoning-aligned models exhibit stronger and more consistent performance under strict evaluation constraints, with the CUET-QwQ-32B configuration achieving the highest average score among our submissions. At the same time, Qwen-14B variants demonstrate stable and competitive performance across diverse topics, showing substantial agreement with assessor-defined evaluation rubrics. Overall, our findings demonstrate that a structured and modular question-generation pipeline can effectively translate large language model reasoning into practical support for reader-centric news trustworthiness assessment, while also providing insights for extending such systems toward multi-source report generation.
Bibtex
@inproceedings{CUET-trec2025-papers-proc-1,
title = {A LangChain-Based Framework for Investigative Question Generation Using Large Language Models},
author = {Adnan Faisal and Shiti Chowdhury},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
LLM-Based Question Generation and Retrieval-Augmented Reporting for News Credibility¶
Georgios Arampatzis, Ioannis Maslaris, Avi Arampatzis
- Participant: DUTH
- Paper: https://trec.nist.gov/pubs/trec34/papers/DUTH.dragun.pdf
- Runs: garamp_qwen25_7b_imp | garamp_qwen25_14b | garamp_yi15_9b | garamp_qwen25_72b | garamp_mistral_7b | garamp_qwen25_14b_r4 | garamp_dragun_t2_q7b | garamp_yi9b_t2_v1 | garamp_qwen25_3b_t2 | garamp_zephyr7b_t2
Abstract
This paper presents the participation of the DUTH team in both tasks of the TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track. The track addresses the challenge of misinformation and biased narratives in digital news through two complementary tasks: Question Generation (Task 1) and Report Generation (Task 2). Task 1 focuses on generating investigative questions that assist readers in assessing news credibility, while Task 2 evaluates systems’ ability to retrieve evidence and generate grounded, well-attributed reports. Our approach employs recent open-weight, instruction-tuned large language models (LLMs), including Qwen2.5, Yi-1.5, and Mistral-7B, combined with prompt engineering, semantic filtering, and retrieval-grounded generation pipelines. All systems were implemented locally using the transformers and accelerate libraries, without external fine-tuning or API access, ensuring full reproducibility and controlled model comparison. Experimental results show that mid-sized instruction-tuned models, most notably Mistral-7B-Instruct, achieve the strongest rubric coverage among the DUTH submissions in the Question Generation task. In the Report Generation task, all evaluated systems exhibit very low contradiction rates, indicating robust factual grounding, but achieve limited rubric coverage under strict retrieval and attribution constraints. Overall, these findings suggest that prompt design, question specificity, and retrieval quality play a more decisive role than raw model scale in supporting explainable and evidence-based news trustworthiness assessment.
Bibtex
@inproceedings{DUTH-trec2025-papers-proc-1,
title = {LLM-Based Question Generation and Retrieval-Augmented Reporting for News Credibility},
author = {Georgios Arampatzis and Ioannis Maslaris and Avi Arampatzis},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}
WaterlooClarke at TREC 2025¶
Siqing Huo, Charles L. A. Clarke
- Participant: WaterlooClarke
- Paper: https://trec.nist.gov/pubs/trec34/papers/WaterlooClarke.dragun.rag.pdf
- Runs: feedbackintheloop | garag_rubric
Abstract
Participating as the WaterlooClarke group, we focused on the RAG track; we also submitted runs for the DRAGUN Track. For the full retrieval augmented generation (RAG) task, we explored four pipelines: 1) Nuggetizer pipeline. 2) Generate an Answer and support with Retrieved Evidence (GARE). 3) Automatic Retrieval and Generation Plan. 4) Combined. 5) Automatically Selected Best Response. For the DRAGUN task, we explored one-shot prompting with feedback in the loop.
Bibtex
@inproceedings{WaterlooClarke-trec2025-papers-proc-1,
title = {WaterlooClarke at TREC 2025},
author = {Siqing Huo and Charles L. A. Clarke},
booktitle = {Proceedings of the 34th Text {REtrieval} Conference (TREC 2025)},
year = {2025},
address = {Gaithersburg, Maryland},
series = {NIST SP xxxx}
}