Skip to content

Runs - Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN) 2025

03_01_Baseline

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: 03_01_Baseline
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-19
  • Task: trec2025-dragun-repgen
  • MD5: 76ac6a4f0c9494662969833c6b066fd1
  • Run description: This run uses a single round for the IR agents and report generation step to provide a baseline.

ConvF_all-t12_5

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: ConvF_all-t12_5
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-qgen
  • MD5: b31942c848ff5f7683c6450b1dc75dfb
  • Run description: Run 7 incorporates the "convince false" article in all phases except question generation and report generation, which use only the original article. Query generation uses a maximum of 5 iterations and the Qwen2.5 7B model.

ConvF_all-t12_5_RG

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: ConvF_all-t12_5_RG
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: c648e9d7695c22d627d46d5b87e4c34c
  • Run description: Run 7 incorporates the "convince false" article in all phases except question generation and report generation, which use only the original article. Query generation uses a maximum of 5 iterations and the Qwen2.5 7B model.

ConvF_all_MI_5

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: ConvF_all_MI_5
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-qgen
  • MD5: 0acb1ffcb0797b2630f1b6ee43b251e3
  • Run description: Run 7 incorporates the "convince false" article in all phases. Query generation uses a maximum of 5 iterations and the Qwen2.5 7B model.

ConvF_all_MI_5_RG

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: ConvF_all_MI_5_RG
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: 6a53084c7d20e79c35d0ad48f72ad235
  • Run description: Run 7 incorporates the "convince false" article in all phases. Query generation uses a maximum of 5 iterations and the Qwen2.5 7B model.

cru-ablR-conf_

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cru-ablR-conf_
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-repgen
  • MD5: e6d503b3e2eb20ddf7192deb105293e3
  • Run description: Crucible@dragun

Original run tag: strict-filtered-crucible-retrieved_docs-most_common-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Just check citation support, then rely on extraction confidence.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20


cru-ablR_

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cru-ablR_
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-repgen
  • MD5: 22b3090fde6dec9ff58659225829e155
  • Run description: Crucible@dragun

Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-most_common-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Will LLM judges generalize across tracks?

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20


cru-claude

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: cru-claude
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-qgen
  • MD5: d35a0715c9562fc872e2ef3cc3af81c7
  • Run description: Prompt-based extraction of nuggets from source article.

This prompt results in nuggets with shorter gold answers, which we will use in our crucible report generation methods.


cru-claude-chatty

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: cru-claude-chatty
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-qgen
  • MD5: 29063f69f717b36c80d681a777d53d90
  • Run description: Prompt-based extraction of nuggets from source article.

This prompt results in gold answers that are long-winded (hence "chatty"), we usually don't like these for our crucible report generation method. But they seem more appropriate for this task.


cru-cloch-ablR-conf_

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cru-cloch-ablR-conf_
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-repgen
  • MD5: 6680eff9576da8cd439326fe8d48a3d9
  • Run description: Crucible@dragun

Original run tag: strict-filtered-crucible-retrieved_docs-claudechatty-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt on ClaudeChatty nuggets. Just check citation support, then rely on extraction confidence.

Crucible report generation. Guiding nuggets: claudechatty Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20


cru-clod-ablR-conf_

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cru-clod-ablR-conf_
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-repgen
  • MD5: d620e59953fbe44641f1c5c6dfd87920
  • Run description: Crucible@dragun

Original run tag: strict-filtered-crucible-retrieved_docs-claude-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt on Claude nuggets. Just check citation support, then rely on extraction confidence.

Crucible report generation. Guiding nuggets: claude Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20


cru-confirm-ansR_

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cru-confirm-ansR_
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-repgen
  • MD5: 52a4ee44a286ee7f6c6c96bc11feedd1
  • Run description: Crucible@dragun

Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-most_common-retrieved-reranker.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt with answers from request article. We are only looking for confirmation.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 250 words. Created on 2025-08-20


cru-most_common

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: cru-most_common
  • Participant: HLTCOE
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-20
  • Task: trec2025-dragun-qgen
  • MD5: 1edde5a7e1040bfbfa14a144f23b36a9
  • Run description: Prompt-based extraction of nuggets from source article.

We use crucible's standard nugget extractor "most_common". The questions are probably more boring, but the gold answers can be matched to other source documents for report generation.


CUET-DeepSeek-R1-Qwen-32B

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-DeepSeek-R1-Qwen-32B
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 12649a47bb39289f37511ff4f15ebd9c
  • Run description: This run processes news topics from the TREC 2025 dataset to generate exactly 10 ranked investigative questions for each topic, emphasizing trustworthiness, bias, motivation, and factual accuracy. It uses a few-shot prompt template with specific examples, sends the topic title and body to a quantized LLM through LangChain, extracts clean numbered questions via regex, retries up to three times if fewer than 10 are generated, fills missing ones with “N/A,” and outputs the results in a TSV submission file (CUET_run9.tsv) formatted with topic ID, team ID, run ID, rank, and question. Duplicate questions are also identified for review.

CUET-Mistral-Small-24B

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-Mistral-Small-24B
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 0276e6cbf468e64fc2048ffbec63ed04
  • Run description: This run processes the official TREC 2025 topic file (trec-2025-dragun-topics.jsonl) to generate exactly 10 ranked investigative questions for each news article. A custom prompt template with few-shot examples is used to guide the model toward producing concise, non-redundant questions focused on evaluating trustworthiness, including aspects like bias, motivation, diversity of viewpoints, and factual accuracy. The code uses regex-based parsing to extract and deduplicate questions, retrying up to three times if fewer than 10 valid outputs are produced. Results are stored in a TSV file for submission, and duplicate detection is performed as a quality check.

CUET-qwen14B-v1

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen14B-v1
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 3132d76ea6d5a597faaba0f8a2b4d05c
  • Run description: This run uses the unsloth/Qwen3-14B-unsloth-bnb-4bit large language model to generate 10 concise, ranked, and critical questions for each topic from the TREC 2025 dataset. The prompt is richly enhanced with two few-shot examples—one inspired by PolitiFact and the other by MBFC-style analysis—which train the model to emulate high-quality fact-checking strategies. The questions aim to assess news credibility, focusing on source bias, factual accuracy, omissions, and framing. LangChain's LLMChain handles inference through a HuggingFace pipeline with sampling enabled. Each article’s title and truncated body are passed through this chain, and output is cleaned using regex. A retry mechanism ensures quality (≥10 questions) with deduplication and padding if needed. Results are saved in a TREC-compatible TSV file CUET_run6.tsv.

CUET-qwen14B-v2

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen14B-v2
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 3e7f5ce1be88003a60acb792d2a4fb98
  • Run description: This run uses the unsloth/Qwen3-14B-unsloth-bnb-4bit model to generate 10 concise and investigative questions per topic from the TREC 2025 dataset. The prompt is enhanced with two few-shot examples to guide the model in producing fact-check-style questions in the spirit of PolitiFact or MBFC. The questions aim to probe the trustworthiness and factual quality of a news article based on its title and truncated body (first 2000 characters). The model is invoked using LangChain’s LLMChain and a HuggingFace pipeline with sampling (temperature=0.7, do_sample=True) for diversity. A regex filter ensures only properly numbered and unique questions of up to 300 characters are accepted. A retry loop allows up to 3 attempts for quality control. If fewer than 10 valid questions are returned, the output is padded with "N/A". The final structured submission is saved in a TSV file named CUET_run7.tsv with fields: topic ID, team ID, run ID, question rank, and cleaned question text.

CUET-qwen14B-v3

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen14B-v3
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: c00f78a1e51030c1abadbb75e6ef36a0
  • Run description: This run utilizes the unsloth/Qwen3-14B-unsloth-bnb-4bit model to generate 10 investigative and critical questions per topic from the TREC 2025 dataset. The questions are designed to help readers assess the credibility and bias of each article. The prompt includes two detailed few-shot examples modeled after PolitiFact and MBFC, guiding the model to focus on:

Evidence and factual integrity

Bias and one-sided reporting

Missing viewpoints or counterarguments

Language framing and sensationalism

Conflicts of interest or affiliations

LangChain’s LLMChain is used to wrap a HuggingFace text generation pipeline with settings that enable diverse outputs (temperature=0.6, top_p=0.9, do_sample=True, max_new_tokens=600). Each article’s body is truncated to the first 2000 characters to fit within the model’s 2048-token context window. A regex is used to extract properly formatted numbered questions up to 300 characters long. The model attempts up to 3 retries per topic to get at least 10 valid questions, padding with "N/A" if not enough are generated. The final output is saved in a tab-separated file named CUET_run8.tsv, with columns: topic ID, team ID, run ID, question rank, and cleaned question.


CUET-qwen14B-v5

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen14B-v5
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 8d0452112c8e166f54ee953f0eaa06ce
  • Run description: This run is designed to generate 10 investigative questions per news article to assess its trustworthiness for the TREC 2025 shared task. The code loads article topics from a JSONL file (trec-2025-dragun-topics.jsonl), and for each article, it uses a Qwen3-14B language model (through the Unsloth implementation) to generate questions that follow strict guidelines focusing on source credibility, evidence quality, origin tracing, and balance. The questions are generated using a LangChain LLMChain and a structured PromptTemplate. A retry loop ensures at least 10 valid and unique questions are produced per topic. The final output is saved in a TSV file for submission.

CUET-qwen4B-v2

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen4B-v2
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 3ff2641eb3c3ae50120b20ec96623305
  • Run description: This run performs automated question generation for the TREC 2025 dataset using the unsloth/Qwen3-4B model, enhanced with few-shot prompting. It begins by loading the dataset of news articles and sets up a detailed prompt template containing two examples of ideal outputs to guide the LLM toward generating high-quality questions. The LangChain pipeline is used with HuggingFace's pipeline integration for efficient inference. Each topic (title and body) is passed through the LLMChain up to 3 times if needed, attempting to generate at least 10 valid, critical, investigative questions. A regex is used to extract and validate the questions. If fewer than 10 questions are generated after retries, the list is padded with "N/A" placeholders. Finally, all questions are cleaned and saved in TSV format for submission as CUET_run3.tsv.

CUET-qwen4B-v3

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-qwen4B-v3
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 41d0178cd0c93da1d038ec0a88879770
  • Run description: This run generates 10 ranked investigative questions for each topic in the TREC 2025 dataset using the unsloth/Qwen3-4B model. The prompt is enhanced with few-shot examples and explicitly instructs the model to rank questions based on importance, emphasizing critical thinking on bias, motivation, factual accuracy, and viewpoint diversity, including right-wing and centrist perspectives. The LangChain LLMChain is built around a HuggingFace pipeline with sampling enabled for generation. Each topic (title + truncated body) is passed to the model, and output is parsed using a regex to extract uniquely numbered questions up to 300 characters. The process includes a retry mechanism (up to 3 attempts) to ensure at least 10 valid questions, with padding as needed. The cleaned and deduplicated questions are saved in CUET_run4.tsv for TREC submission.

CUET-QwQ-32B

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-QwQ-32B
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 3520231dc040f97db5dbd33d9f34575f
  • Run description: This run loads the TREC 2025 topics dataset and applies a 4-bit quantized version of the UnsLoTh QwQ-32B language model to generate ten critical investigative questions per news article. The questions aim to evaluate the trustworthiness of the articles by focusing on source bias, motivation, diversity of viewpoints, and factual accuracy. A carefully crafted prompt with few-shot examples guides the model. The output is parsed to extract unique questions, with multiple attempts per topic to ensure completeness. Finally, the results are formatted into a submission file for further use.

CUET-unsloth-Mistral-Small

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: CUET-unsloth-Mistral-Small
  • Participant: CUET
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-08
  • Task: trec2025-dragun-qgen
  • MD5: 25135ea796d4c68f6a445cd298a1ffd5
  • Run description: This run processes the TREC 2025 topic file (trec-2025-dragun-topics.jsonl) containing news article titles and bodies. A custom PromptTemplate is used to instruct the LLM to generate 10 concise and critical investigative questions for each article, focusing on source bias, intent, diversity of viewpoints, and factual accuracy. The model’s output is parsed using a regex pattern to extract exactly 10 unique questions per topic, which are then saved in TSV format for submission.

cursor-enhanced

Participants | Input | dragun-qgen | Appendix

  • Run ID: cursor-enhanced
  • Participant: cycraft
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: d151426ca13e4dd8fd2799b41002f5be
  • Run description: Automatically enhance the start kit by providing a list of contrastive examples.

cursor-report

Participants | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: cursor-report
  • Participant: cycraft
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-repgen
  • MD5: 9eeb340939e66bd7d70b3cf398e9ef63
  • Run description: Automatically enhance the start kit by providing a list of contrastive examples.

dragun-organizers-starter-kit-task-1

Participants | Input | dragun-qgen | Appendix

  • Run ID: dragun-organizers-starter-kit-task-1
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-05
  • Task: trec2025-dragun-qgen
  • MD5: a179afed11faafd58a3c79aff9c587cd
  • Run description: https://github.com/trec-dragun/2025-starter-kit

dragun-organizers-starter-kit-task-2

Participants | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: dragun-organizers-starter-kit-task-2
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-05
  • Task: trec2025-dragun-repgen
  • MD5: a29554a7fe952cb4aabdd328dece6059
  • Run description: https://github.com/trec-dragun/2025-starter-kit

feedback-rerank

Participants | Input | dragun-qgen | Appendix

  • Run ID: feedback-rerank
  • Participant: cycraft
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 1bfbb9906a9b3f1f41c55a243e0f4cb7
  • Run description: Use LLM to rerank the questions based on the LLM-generated feedback

feedbackintheloop

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: feedbackintheloop
  • Participant: WaterlooClarke
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-18
  • Task: trec2025-dragun-qgen
  • MD5: f3db57a837dd8558019b596f58246386
  • Run description: With automatically generated feedback in the loop

garag_rubric

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garag_rubric
  • Participant: WaterlooClarke
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-19
  • Task: trec2025-dragun-repgen
  • MD5: 2c5896bd0caca0f8ba18c32e8b969f4d
  • Run description: Generate first (with open web search)

garamp_dragun_t2_q7b

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garamp_dragun_t2_q7b
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: e85458996a32062f25606eb359433439
  • Run description: BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 8 evidence passages after de-dup/length filtering. A single LLM pass (Qwen2.5-7B-Instruct) generates a <=250-word report in 4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to <=3, and aligns outputs 1-to-1 with the official topics list.

garamp_mistral_7b

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: garamp_mistral_7b
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 5c5491cd89cb92eac18f92118ffd21b7
  • Run description: Zero-shot with Mistral-7B-Instruct-v0.3. We create ~30 candidates/topic under strict formatting/length rules and remove compound questions, then pick the final 10 via TF-IDF MMR (α=0.7). Seed=42.

garamp_qwen25_14b

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: garamp_qwen25_14b
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 7156e7676439bd1bb11abae1ce434b1a
  • Run description: Same pipeline as the 7B run but with Qwen2.5-14B-Instruct. ~30 candidates/topic → cleaning (≤300 chars, single question, no compound connectors) → MMR selection of the final 10. Seed=42.

garamp_qwen25_14b_r4

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garamp_qwen25_14b_r4
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: 3c9cbb1e1b2d1622ad488d367b4b7693
  • Run description: BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index (msmarco-v2.1-doc-segmented.20240418.4f9675). For each topic we retrieve up to k=40 candidate segments and keep at most 18 evidence passages (dedup/length filtering) to fit the context window. A single LLM pass generates a <=250-word report in 3–5 sentences; each sentence cites up to 3 segment docids. Post-processing clips citations to <=3, validates JSON, and aligns outputs to the official topic list (1 line/topic)

garamp_qwen25_3b_t2

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garamp_qwen25_3b_t2
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: 787753e9408cf78da3f3a98ba4cac7af
  • Run description: BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 12 evidence passages after de-dup/length filtering. A single LLM pass (Qwen2.5-3B-Instruct) produces a ≤250-word report in ~4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to ≤3, and aligns outputs 1:1 with the official topic list.

garamp_qwen25_72b

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: garamp_qwen25_72b
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 087b07963489a36ac1ac1dece8a53872
  • Run description: Same structured pipeline with Qwen2.5-72B-Instruct: generate ~30 candidates/topic, apply cleaning (≤300 chars, single question, English, no compound), then choose 10 using TF-IDF MMR (α=0.7). Seed=42.

garamp_qwen25_7b_imp

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: garamp_qwen25_7b_imp
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 471a9a032529259f84c484ebba8cc7b3
  • Run description: Zero-shot question generation for 30 topics using a fixed system prompt enforcing: ≤300 characters, one question per line, English, ends with “?”, and no compound questions (no “and/and-or”). We produce ~30 candidates per topic, clean/filter them, then select 10 via TF-IDF MMR (α=0.7). Seed=42.

garamp_yi15_9b

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: garamp_yi15_9b
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-21
  • Task: trec2025-dragun-qgen
  • MD5: 2133d39592c3fc1feb9eb64d7a24bec7
  • Run description: Zero-shot generation with Yi-1.5-9B-Chat. ~30 candidates per topic; enforce ≤300 chars, one question, ends with “?”, and no compound questions; select 10 using TF-IDF MMR (α=0.7). Seed=42.

garamp_yi9b_t2_v1

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garamp_yi9b_t2_v1
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: cd349a1b2897261e9f35b6cc6a5c94de
  • Run description: BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 8 evidence passages after de-dup/length filtering. A single LLM pass (Yi-1.5-9B-Chat) produces a ≤250-word report in ~4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to ≤3, and aligns outputs 1:1 with the official topic list.

garamp_zephyr7b_t2

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: garamp_zephyr7b_t2
  • Participant: DUTH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-23
  • Task: trec2025-dragun-repgen
  • MD5: c15697e360c39d21a0e6b1996bddd2a8
  • Run description: BM25 retrieval with Pyserini over the MS MARCO V2.1 (Segmented) Lucene index. For each topic we retrieve k=40 segments and keep up to 10 evidence passages after de-dup/length filtering. A single LLM pass (Zephyr-7B-Beta) produces a ≤250-word report in ~4 sentences; each sentence cites up to 3 MS MARCO segment docids. Post-processing validates JSON, clips citations to ≤3, and aligns outputs 1:1 with the official topics list.

h2oloo_gpt5_orig

Participants | Input | dragun-qgen | Appendix

  • Run ID: h2oloo_gpt5_orig
  • Participant: h2oloo
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: 0b637c707d3239dfc27b9d2e835dd2e1
  • Run description: h2oloo Original Prompt modified from last year

h2oloo_gpt5_step

Participants | Input | dragun-qgen | Appendix

  • Run ID: h2oloo_gpt5_step
  • Participant: h2oloo
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: 670f3f8114da678025afe8b9aab31d7a
  • Run description: Stepwise Prompt modified from last year

h2oloo_qw3-30b_orig

Participants | Input | dragun-qgen | Appendix

  • Run ID: h2oloo_qw3-30b_orig
  • Participant: h2oloo
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: 764cc7dbb0d8a201712c4157a02c77f7
  • Run description: h2oloo Original Prompt modified from last year

h2oloo_qw3-30b_step

Participants | Input | dragun-qgen | Appendix

  • Run ID: h2oloo_qw3-30b_step
  • Participant: h2oloo
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: e6dda6527d1ddd2b038e89130e780c19
  • Run description: Stepwise Prompt modified from last year

organizer-gpt-oss-t1

Participants | Input | dragun-qgen | Appendix

  • Run ID: organizer-gpt-oss-t1
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: b8671eff3e7e19f59ea3d7f8c1bed6ff
  • Run description: Used gpt-oss-120b as the LLM backend.

organizer-gpt-oss-t2

Participants | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: organizer-gpt-oss-t2
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-repgen
  • MD5: 3cf163825b89fcf7bcbb2bda5989fbb0
  • Run description: Used gpt-oss-120b as the LLM backend.

organizer-t1-chatgpt

Participants | Input | dragun-qgen | Appendix

  • Run ID: organizer-t1-chatgpt
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: c20dd64cc09f9f1b92cb4cf6e37422da
  • Run description: ChatGPT 5 Pro with Deep Research via its web interface.

organizer-t1-perplex

Participants | Input | dragun-qgen | Appendix

  • Run ID: organizer-t1-perplex
  • Participant: coordinators
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-25
  • Task: trec2025-dragun-qgen
  • MD5: 5749d566442f558d90a0744b28985457
  • Run description: Perplexity with Deep Research via its web interface.

SCIAI_03_02_Three

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SCIAI_03_02_Three
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: 582fcfe0480c3e20a960396a92326d20
  • Run description: Three rounds

SCIAI_03_03_Five

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SCIAI_03_03_Five
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: a21a7e1659c608f00b9fc94206f955e0
  • Run description: Five rounds

SCIAI_03_04_Eight

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SCIAI_03_04_Eight
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: a38db40436f21bf95113224326cf1ed5
  • Run description: Eight rounds

SK_ConvinceF_MI_2

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: SK_ConvinceF_MI_2
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-qgen
  • MD5: a1789bb6e9cd57468179802b36425c10
  • Run description: An original article is used to generate a "convince false" article that refutes its claims. From these, in the first iteration, 10 queries are created: 5 derived from the original article and 5 from the "convince false" article. The remaining process continues as originally defined.

SK_ConvinceF_MI_2_RG

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SK_ConvinceF_MI_2_RG
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: f3b3a6eb8ccb407e735958ec029dd521
  • Run description: An original article is used to generate a "convince false" article that refutes its claims. From these, in the first iteration, 10 queries are created: 5 derived from the original article and 5 from the "convince false" article. The remaining process continues as originally defined.

SK_Critique_MI_5

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: SK_Critique_MI_5
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-qgen
  • MD5: 2bc63a4f0e1403a9cab4d6afa7e93afd
  • Run description: Based on the starter kit. The first 10 queries were based on the article (5) and a critique (5) generated for the original article. The rest of the process is the same.

SK_Critique_MI_5_RG

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SK_Critique_MI_5_RG
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: 6f43be05a9c4de40f0357d86722c528b
  • Run description: Same as run5 task 1.

SK_MI_1

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: SK_MI_1
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-qgen
  • MD5: 94de962753f7b1520074329193c93b8e
  • Run description: Baseline. This run is based on the starter kit, but with a tweak in the prompts to match the small model Qwen 7 B and the max iteration of 1.

SK_MI_2

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: SK_MI_2
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-qgen
  • MD5: 3258a24ac48a081881b614cd4cf2b7ee
  • Run description: Baseline. Based on the starter kit, with tweaks in prompts and a max iteration of 2.

SK_MI_2_RG

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: SK_MI_2_RG
  • Participant: TREMA-UNH
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: 772333e32e4585daae69b8b58ac71736
  • Run description: Same as task 1

Team01_Run01_Winner

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: Team01_Run01_Winner
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-17
  • Task: trec2025-dragun-repgen
  • MD5: d9573fec920f7a29ceb95e1f481a15c8
  • Run description: Our best attempt with our finalized pipeline using only the MS MARCO data for report generation automatically.

Team02_Run01_1000SegmentsExpansion

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: Team02_Run01_1000SegmentsExpansion
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-15
  • Task: trec2025-dragun-repgen
  • MD5: c3ecc285056b9756ae41f77a1218723f
  • Run description: A set of 60 questions are generated based on the article contents via three LLM calls. These questions are narrowed down to 10 using a pre trained model that ranks questions and by removing questions that are too similar to other questions. These questions are used to generate additional queries. Each query is used to retrieve the top 1000 segments from MS MARCO V2.1 (Segmented), followed by reranking techniques and a LLM being used to select the most relevant segments for each question. An LLM then answers as many questions as possible using the retrieved segments before hitting the 250 word count limit in the final report.

Team02_Run02_100SegmentsExpansion

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: Team02_Run02_100SegmentsExpansion
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-15
  • Task: trec2025-dragun-repgen
  • MD5: 9f5a770b48645611b49a66c240b73c70
  • Run description: A set of 60 questions are generated based on the article contents via three LLM calls. These questions are narrowed down to 10 using a pre trained model that ranks questions and by removing questions that are too similar to other questions. These questions are used to generate additional queries. Each query is used to retrieve the top 100 segments from MS MARCO V2.1 (Segmented), followed by reranking techniques and a LLM being used to select the most relevant segments for each question. An LLM then answers as many questions as possible using the retrieved segments before hitting the 250 word count limit in the final report.

Team02_Run03_100SegmentsNoExpansion

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: Team02_Run03_100SegmentsNoExpansion
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-15
  • Task: trec2025-dragun-repgen
  • MD5: cb0e321a12f7f1b7eb4ecd23fd58bd6d
  • Run description: A set of 60 questions are generated based on the article contents via three LLM calls. These questions are narrowed down to 10 using a pre trained model that ranks questions and by removing questions that are too similar to other questions. These questions are used to retrieve the top 100 segments from MS MARCO V2.1 (Segmented). This is followed by reranking techniques and a LLM being used to select the most relevant segments for each question. An LLM then answers as many questions as possible using the retrieved segments before hitting the 250 word count limit in the final report.

Team02_Task1

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: Team02_Task1
  • Participant: SCIAI
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-18
  • Task: trec2025-dragun-qgen
  • MD5: 02d48540199b8147969557beebf3a266
  • Run description: For each article, 60 questions were generated. Using a pre trained model based on the rankings of questions generated from 2024, the 60 questions were sorted from best to worst. Any questions deemed too similar to other questions are removed. Finally, the top 10 questions are used in the final report.

UR_IW_run_1

Participants | Proceedings | Input | dragun-qgen | Appendix

  • Run ID: UR_IW_run_1
  • Participant: UR_trecking
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-qgen
  • MD5: 8be66013872e5e531618543254a4de17
  • Run description: 30 questions per article were generated (using GPT-5 nano) and these were filtered for compound questions (those were removed).

UR_IW_run_1_task2

Participants | Proceedings | Input | contradictory.dragun-repgen | repgen_results | supportive.dragun-repgen | Appendix

  • Run ID: UR_IW_run_1_task2
  • Participant: UR_trecking
  • Track: Detection, Retrieval, and Augmented Generation for Understanding News (DRAGUN)
  • Year: 2025
  • Submission: 2025-08-22
  • Task: trec2025-dragun-repgen
  • MD5: 95ed0bd3ed4fe298a5cfc1e70e4f29c1
  • Run description: We used CoT query expansion (Jagerman et al., 2023) to transform questions from task 1 into queries. We searched on an Elastic Search index with MS Marco v2.1 segmented with a multi-match query using the standard english analyzer. We used the mono T5 reranker to rerank the top 1000 retrieval results. We judged the relevance up to the top 100 reranked documents, we checked for the following conditions: is the source of the retrieved document trustworthy (dataset Lin et al., 2023; PC trustworthiness score > 0.7), is the document relevant (using an LLM). We used the remaining documents to generate the report: We prompted an LLM to generate answers for the questions that we were able to retrieve segments for, if we had more than 10 questions that we were able to answer we used k-means to select the 10 most diverse questions based on their embedding. A shortener was employed to reduce the size of the report to a maximum of 250 words.