Runs - Retrieval Augmented Generation (RAG) 2025¶

4method_merge¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: 4method_merge
Participant: UTokyo
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: d1439f4e4e92c4bfb292c7e5d4b3e794
Run description: Comprehensive 4-method hybrid retrieval system combining two dense retrievers (Qwen3-0.6B and BGE-small-en-v1.5, both HyDE-enhanced with query:HyDE 0.3:0.7 weighting) and two sparse methods (SPLADE learned sparse representations and BM25 with GPT-4.1-generated keyword expansion). All four retrieval streams produce top-1000 results that are fused using Reciprocal Rank Fusion (RRF, k=60) to leverage diverse relevance signals. Final ranking performed by GPT-4.1-mini using sliding window reranking (window=10, stride=5, 3 passes) with enriched document context including title, URL, and segment content for improved relevance assessment.

ag-run-1-JH¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: ag-run-1-JH
Participant: NITATREC
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 8cd67655c083cda87bdc498537aa11aa
Run description: For each query from trec_rag_2025_queries.jsonl, we loaded its precomputed candidate list from retrieve_results_rankqwen3_32b.rag25_top100.jsonl and selected the top-20 passages in rank order. We then constructed a citation-aware prompt that injected the top-k context passages (k=8), each annotated with its docid, and instructed the model to answer strictly from the provided context, require every sentence to end with [CITATION: docid], and remain under 400 words. Answer generation used Falcon-7B-Instruct (tiiuae/falcon-7b-instruct) via the Hugging Face text-generation pipeline with deterministic decoding (do_sample=False, temperature=0.0, max_new_tokens=400). Finally, for each query, we emitted a TREC Format-2 record containing metadata (including the exact prompt), the list of 20 reference docids, and the sentence-level answer with citations.

ag-v1-gpt¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: ag-v1-gpt
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-auggen
MD5: 885d02a797eebaf7cdea36680faa8807
Run description: The execution process automatically integrates retrieval, reranking, summary generation, citation validation, paragraph reorganization, and linguistic refinement. For each main query, decomposed sub-queries are individually retrieved for the top-10 document segments. Concise factual answers are generated based on the retrieved content, with LLM analysis on citation support. All sub-answers are then reordered and refined by the LLM (gpt-4.1-mini), and finally output as a structured, multi-paragraph answer in JSON format with source citations—ensuring clear logic, traceable provenance, precise content, and rigorous structure.

ag-v2-gpt¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: ag-v2-gpt
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-auggen
MD5: 9f8f0932ad1b7c37f492d5d6b417a8ec
Run description: This process combines sub-query answer generation and final article integration. For each query ID, it first uses a cross-encoder to retrieve the top-10 documents per sub-query, then generates concise sub-query answers. Finally, the LLM (gpt-4.1-mini) integrates answers into a complete article with citation markers, splitting the result into sentence-level outputs. The final results are output in JSON format, including query information, total word count, citation list, and answers with citation annotations, fully presenting the entire retrieval-to-generation pipeline.

ag-v2-llama¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: ag-v2-llama
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-auggen
MD5: f96ae0890586a3fb3922787716fb2ab9
Run description: This process combines sub-query answer generation and final article integration. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, concise sub-query answers are generated, and the LLM (Llama 3.1 8b) merges answers into a complete article with citation markers, splitting the output into sentence-level answers. The final results are output in JSON format, containing query details, total word count, citation list, and citation-annotated answers, comprehensively presenting the entire process from retrieval to answer generation.

Anserini_bm25_only¶

Participants | Input | trec_eval | Appendix

Run ID: Anserini_bm25_only
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: 787ed782433ec3ed69943f6703e46293
Run description: A standard BM25 retrieval baseline run using Anserini on the MS MARCO v2.1 segmented document collection

anserini_bm25_top100¶

Participants | Input | trec_eval | Appendix

Run ID: anserini_bm25_top100
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: f55e28ed7f78b33ef806a9016e89d02a
Run description: Anserini BM25

auto_plan¶

Run ID: auto_plan
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 3b135860890f8a45af1540f9391915f3
Run description: Automatically planning

auto_selected¶

Run ID: auto_selected
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 15a50672b8dfa9fba10e6686ea02d51f
Run description: Automatically select from multiple generations

bm25-rz7b-2025a¶

Run ID: bm25-rz7b-2025a
Participant: ii_research
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: ca4efd3ec0416a4daf40ce5a3ac05457
Run description: BM25 on MS MARCO v2.1 retrieves up to 1000 candidates per query. We then apply windowed listwise reranking with the open-weight 7B model castorini/rank_zephyr_7b_v1_full to obtain the Top-100. The Top-K segments together with the narrative are fed to a ReClaim-style generator built on meta-llama/Llama-3.1-8B-Instruct with open-weight fine-tuned heads to produce claim-reference pairs. Finally, we post-process with sentence scoring, deduplication, and a word budget filter.

bm25_NITA_JH¶

Participants | Input | trec_eval | Appendix

Run ID: bm25_NITA_JH
Participant: NITATREC
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: db8e9989f77294a30de47a8374a279d0
Run description: This run applies a BM25 retrieval pipeline using Pyserini over the MS MARCO v2.1 segmented corpus. A Lucene index was constructed with positional information, document vectors, and raw text storage enabled, and queries were preprocessed into TSV format for compatibility. Retrieval was performed with BM25 (k1=1.2, b=0.75), returning the top-100 ranked segments per query, and outputs were generated in standard TREC run file format.

bm25_rocchio_top100¶

Participants | Input | trec_eval | Appendix

Run ID: bm25_rocchio_top100
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 3b5aba968699f34e9c93179564caed5a
Run description: Anserini BM25 + Rocchio

citation_cnt¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: citation_cnt
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-25
Task: trec2025-rag-qrels
MD5: 58125c710120e908826e7a99cc23516c
Run description: Count of citations

cluster-generation¶

Run ID: cluster-generation
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: a9e461d94b248e83c30d480da5782267
Run description: Nugget generation from top 20 passages, clustering generated nuggets with llm and then response generation from these nuggets. Using gpt4o.

cluster_cnt¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: cluster_cnt
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-25
Task: trec2025-rag-qrels
MD5: af3fb3096f97cdb78ffdb5274e8ae7ea
Run description: Count of clusters

combined¶

Run ID: combined
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: ee3a8ed30ad177c052823b4235e03c5a
Run description: Combining retrieved passages from multiple sources + Nuggetizer

cru-ablR¶

Run ID: cru-ablR
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: b17948c3465fa47041dfc0489f9a3b02
Run description: Crucible@rag25

Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Will LLM judges generalize across tracks?

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18

cru-ablR-conf¶

Run ID: cru-ablR-conf
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: b7361ce87f4c5e5da5fa139201ba13aa
Run description: Crucible@rag25

Original run tag: strict-filtered-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Just check citation support, rely on extraction confidence.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18

cru-ansR¶

Run ID: cru-ansR
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 83128bf22b6e41f4b332ad5863443373
Run description: Crucible@rag25

Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Will LLM judges generalize across tracks?

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18

cru-ansR-bareconf¶

Run ID: cru-ansR-bareconf
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 508155b32b20c94d85453d9008ebca52
Run description: Crucible@rag25

Original run tag: strict-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Just rely on extraction confidence.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct No sentence filtering with argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18

cru-ansR-conf¶

Run ID: cru-ansR-conf
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 31ceee7cc81a059f2e006d11948f4aec
Run description: Crucible@rag25

Original run tag: strict-filtered-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Just check citation support, rely on extraction confidence.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18

duth.hybrid.qwen.cal¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: duth.hybrid.qwen.cal
Participant: DUTH
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-31
Task: trec2025-rag-qrels
MD5: 8de7ba7990e16104857a971abb81aebd
Run description: Automatic RJ run. Hybrid judge blending Qwen2.5-3B output, Jaccard overlap (narrative↔segment), and normalized baseline (top-20) scores into a confidence; per-topic caps/floors; final calibration (th1=0.40, th2=0.52, th3=0.66, th4=0.78; cap4=2, cap34=5). Focus: strong 3/4 with healthy 2s.

duth.hybrid.qwencon¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: duth.hybrid.qwencon
Participant: DUTH
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-31
Task: trec2025-rag-qrels
MD5: 29dae43350f454b5a24be09308f26798
Run description: Conservative variant of the Qwen hybrid. Same pipeline; tighter calibration for high labels (th1=0.40, th2=0.54, th3=0.68, th4=0.80; cap4=1, cap34=4; topk4=1, topk3=4, topk2=8). Focus: higher precision for 3/4.

duth.hybrid.stableri¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: duth.hybrid.stableri
Participant: DUTH
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-31
Task: trec2025-rag-qrels
MD5: a7a41b474895e132fa7fef102f9f2acc
Run description: Automatic RJ run with StableLM-2-1.6B. Same hybrid confidence (LLM + Jaccard + baseline). Calibrated to emphasize recall at label=2 (floor-2=4; th1=0.30, th2=0.38, th3=0.56, th4=0.70; cap4=2, cap34=6). Focus: many trustworthy 2s plus some 3/4.

duth_stablelm2_rj_v1¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: duth_stablelm2_rj_v1
Participant: DUTH
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-28
Task: trec2025-rag-qrels
MD5: f4943c8bbd35ea9949faee730ca46dee
Run description: Automatic run for the Relevance Judgment subtask. We use an open-weight LLM (stabilityai/stablelm-2-1_6b-chat) as an automatic assessor at the segment level. The prompt encodes the TREC rubric (0–4); decoding is deterministic (do_sample=False, max_new_tokens=16). The model outputs "LABEL, CONFIDENCE" which we parse to produce lines: qid Q0 docid label confidence run_id. We submit exactly top-k segments

e5_monot5_searchR1¶

Run ID: e5_monot5_searchR1
Participant: uogTr
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-20
Task: trec2025-rag-generation
MD5: aa6d999a1c7946fde4c79b3e902dd475
Run description: We input the queries into an Agentic RAG model (Search-R1), which can interact with the retrieval pipeline automatically within its reasoning process until it reaches the final answer.

ensemble_umbrela1¶

Participants | Input | qrel_eval | Appendix

Run ID: ensemble_umbrela1
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 69b23d7f565ba2e1098d5f1b3b6b7c44
Run description: Majority voting over various qrels using UMBRELA variant

extractive_rag¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: extractive_rag
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: e517476a24ee9c1cf17422a7029e37fc
Run description: This approach generates 10 queries and includes the top 3 documents from each search. Facts are extracted from documents and grouped to reduce duplicate information. The answer is intended to be independent facts as in a bulleted-list. Splade V3 is used for search. gpt-oss-20b is used to identify facts and group them.

full¶

Run ID: full
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 32a6fdc8a0949ff567c09567fb0feece
Run description: Splade -> query decomposition (gemma-3-27b-it) -> Qwen3-Reranker-8B -> reciprocal rank fusion -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)

full-ret¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: full-ret
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: c51eb9736ee80570e960a1020bff78be
Run description: Splade -> query decomposition (gemma-3-27b-it) -> Qwen3-Reranker-8B -> reciprocal rank fusion

garag¶

Run ID: garag
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 718c51860cb68ea924f583bb55ab3461
Run description: Generate first

gemini_2_5_pro¶

Participants | Input | qrel_eval | Appendix

Run ID: gemini_2_5_pro
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: 9bb9a4269a872b8b4d08d389d4a756ec
Run description: Uses UMBRELA with Gemini 2.5 Pro

genSubQ_merge¶

Run ID: genSubQ_merge
Participant: uogTr
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-20
Task: trec2025-rag-generation
MD5: 623ac659acd6b1dcbec59b25af6a2fda
Run description: Initially divides each question into subquestions using GPT4.0mini, then answers are generated for each sub-query using Llama3 based on the top 3 retrieved documents. LLama3 is then used to merge the sub-answers into the final response.

gpt-oss-120b-high¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-high
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 82d3098bd02ab052b530a60662bbc0a4
Run description: Uses UMBRELA variant with gpt-oss-12b high reasoning

gpt-oss-120b-low¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-low
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: e0eb0164749fd82a1ef9c71357081265
Run description: Uses UMBRELA variant with gpt-oss-120b low reasoning

gpt-oss-120b-med¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-med
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 869accf253d694f14fabd18247f78482
Run description: Uses UMBERLA variant with gpt-oss-120b medium reasoning

gpt-oss-120b-sn-high¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-sn-high
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 12c384d54bf24a5ea4c043dc49d39b28
Run description: Creates sub-narratives from the narratives (gpt-oss-120b high reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b high reasoning)

gpt-oss-120b-sn-low¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-sn-low
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 735e8eb0afc672b4a14ea2d120f02556
Run description: Creates sub-narratives from the narratives (gpt-oss-120b low reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b low reasoning)

gpt-oss-120b-sn-med¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-120b-sn-med
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: b5a6450321c3520de09389dc594db403
Run description: Creates sub-narratives from the narratives (gpt-oss-120b medium reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b medium reasoning)

gpt-oss-20b-high¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-high
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: f0ca50b8eb77c57537e76f096dd28909
Run description: uses UMBRELA variant with gpt-oss-20B with high reasoning

gpt-oss-20b-low¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-low
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 1c8d1f48b785bc5f34dde18fdf709cf9
Run description: uses UMBRELA variant with gpt-oss-20B with low reasoning

gpt-oss-20b-medium¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-medium
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 8633f0e61e6e315bf774a62997decbc6
Run description: uses UMBRELA variant with gpt-oss-20B with medium reasoning

gpt-oss-20b-sn-high¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-sn-high
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: b4f154a286cc9b087a1e305d5342c0b0
Run description: Creates sub-narratives from the narratives (gpt-oss-20b high reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b high reasoning)

gpt-oss-20b-sn-low¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-sn-low
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: c877daa891d265cbee9be5e2c9579045
Run description: Creates sub-narratives from the narratives (gpt-oss-20b low reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b low reasoning)

gpt-oss-20b-sn-med¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt-oss-20b-sn-med
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 3e616216958e163af9558e954801026b
Run description: Creates sub-narratives from the narratives (gpt-oss-20b medium reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b medium reasoning)

gpt41¶

Run ID: gpt41
Participant: UTokyo
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-auggen
MD5: 796507842a74452c0f52576da5c46ef4
Run description: Answer generation using GPT-4.1 with the Ragnarök framework's baseline prompt configuration. Processes retrieved passages to generate comprehensive answers, leveraging GPT-4.1's advanced reasoning capabilities without additional prompt engineering or retrieval-aware modifications. Uses standard Ragnarök format for passage presentation and answer structuring.Answer generation using GPT-4.1 with the Ragnarök framework's baseline prompt configuration. Processes retrieved passages to generate comprehensive answers, leveraging GPT-4.1's advanced reasoning capabilities without additional prompt engineering or retrieval-aware modifications. Uses standard Ragnarök format for passage presentation and answer structuring.

gpt_4-1¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt_4-1
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: 23f1c42688af195e1b7d513d452b57fd
Run description: Using UMBRELA variant with GPT-4.1

gpt_4-1-sub-narr¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt_4-1-sub-narr
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: 04fd2d4f67bbe4b7cf0fec4676fb45fc
Run description: Creates sub-narratives from the narratives (GPT4.1) -> Uses the list of sub-narratives to allocate relevance labels (GPT4.1)

gpt_4_1-sub-narr-2¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt_4_1-sub-narr-2
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: b02fdccd1aeb1680719d77872e7db146
Run description: Creates sub-narratives from the narratives (GPT4.1-nano) -> Re-writes them (GPT4.1) -> Uses the list of sub-narratives to allocate relevance labels (GPT4.1)

gpt_5-sub-narr¶

Participants | Input | qrel_eval | Appendix

Run ID: gpt_5-sub-narr
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: ad17637781529c0b03334a5c92d6044d
Run description: Creates sub-narratives from the narratives (GPT5) -> Uses the list of sub-narratives to allocate relevance labels (GPT5)

gptr.nt_q4d4¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: gptr.nt_q4d4
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 487bc7b03b4399a019d452c44dc2c515
Run description: This run leverages the gpt-researcher framework. It uses notetaking to identify the most informative parts of the document and then generates the answer based on the notes. The system generate 3 queries and uses the initial title to retrieve source documents. The top four documents from each retrieval via splade v3 is used as source material for the generation.

gptr_e2_q3d3¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: gptr_e2_q3d3
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 1d68eadd302bed7ce312cf541e6250a9
Run description: This run leverages the gpt-researcher framework. It uses a filtering approach to rank snippets in a document as most useful to the query. The top 4 snippets are selected and answers are generated based on the snippets. The system generate 2 queries and uses the initial title to retrieve source documents. The top three documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.

gptr_e2_q4d4¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: gptr_e2_q4d4
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: d4701778f9b7f77c87164bfa80579380
Run description: This run leverages the gpt-researcher framework. It uses a filtering approach to rank snippets in a document as most useful to the query. The top 4 snippets are selected and answers are generated based on the snippets. The system generate 3 queries and uses the initial title to retrieve source documents. The top four documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.

gptr_nt_q3d3¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: gptr_nt_q3d3
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: b04de1e5e182096fb134e3897e7006ca
Run description: This run leverages the gpt-researcher framework. It uses notetaking to identify the most informative parts of the document and then generates the answer based on the notes. The system generate 2 queries and uses the initial title to retrieve source documents. The top three documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.

grilllab-agent-gpt45¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: grilllab-agent-gpt45
Participant: grilllab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: c2f20473f6997226a808b4da4a5fe01c
Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.

grilllab-agentic-gpt4¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: grilllab-agentic-gpt4
Participant: grilllab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-14
Task: trec2025-rag-retrieval
MD5: bd226465eda58793e088380a5261cdcd
Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.

grilllab-agentic-gpt4-generation¶

Run ID: grilllab-agentic-gpt4-generation
Participant: grilllab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-14
Task: trec2025-rag-generation
MD5: b7ea1a5b8b8debb98f22e4b281c4b075
Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.

grilllab-gpt45-gen¶

Run ID: grilllab-gpt45-gen
Participant: grilllab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 91bc14a2e136701c27f2bb22330cc40b
Run description: Uses GPT4.1 & 5 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.

hltcoe-fsrrf¶

Participants | Input | trec_eval | Appendix

Run ID: hltcoe-fsrrf
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 6ab7f8d2ac42744623005f87473affae
Run description: RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3

hltcoe-gpt5.searcher¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: hltcoe-gpt5.searcher
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 84a93c7dea89f337d92cd7b154c423ca
Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for most steps. GPT-5 is used for final answer generation (drafting) and answer shortening (revising report). LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.

hltcoe-jina¶

Participants | Input | trec_eval | Appendix

Run ID: hltcoe-jina
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: f1abb78342f2d7ee00fde5a34b68cd0a
Run description: jinaai/jina-reranker-m0 (2.4B) reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3

hltcoe-lg.fsrrf¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: hltcoe-lg.fsrrf
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 43c6509d6d6d39f49b023fcc26660a6a
Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.

hltcoe-lg.jina¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: hltcoe-lg.jina
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: eb2b5dee3977cf8cd41dadf33b63d091
Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from jinaai/jina-reranker-m0 (2.4B) reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.

hltcoe-lg.qwen¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: hltcoe-lg.qwen
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: a3e7f48c341214060b0a2e3fd7565a97
Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Qwen/Qwen3-Reranker-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.

hltcoe-lg.searcher¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: hltcoe-lg.searcher
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 68d180accca721e7bc675e3ec716ac1b
Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.

hltcoe-qwen¶

Participants | Input | trec_eval | Appendix

Run ID: hltcoe-qwen
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 75d56806985f7c668115a8a59b28ca83
Run description: Qwen/Qwen3-Reranker-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3

hltcoe-qwen-jina¶

Participants | Input | trec_eval | Appendix

Run ID: hltcoe-qwen-jina
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: a00f7c9512dcd3cf1b83d0b5b5918ab3
Run description: Fusion of jina-reranker-m0 and Qwen-Embedding-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3

hltcoe-searcher¶

Participants | Input | trec_eval | Appendix

Run ID: hltcoe-searcher
Participant: hltcoe-rerank
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 86fb809534dc3cbe474deaac38635bd7
Run description: Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3

hybrid-rerank¶

Participants | Input | trec_eval | Appendix

Run ID: hybrid-rerank
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 918bc936c263c02618f317eacd1282a4
Run description: hybrid (bm25, embedding) + rerank

hybrid.stable.loose2¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: hybrid.stable.loose2
Participant: DUTH
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-31
Task: trec2025-rag-qrels
MD5: ea086f09a8936fd2ed05e975f6401ddd
Run description: Looser StableLM hybrid. Same blended confidence; calibration tuned for broader non-zero coverage (th1=0.30, th2=0.40, th3=0.56, th4=0.70; cap4=2, cap34=7; floor-2=3). Focus: high pool coverage with balanced 2/3/4.

IDACCS-hybrid-gpt4-1¶

Run ID: IDACCS-hybrid-gpt4-1
Participant: IDACCS
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 41eeff2b5d22ae79a4c42042f8cdb3b3
Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used gpt-4.1 to paraphrase Use t5-base to attribute each sentence

IDACCS-hybrid-gpt4o¶

Run ID: IDACCS-hybrid-gpt4o
Participant: IDACCS
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 53e37227d588946b9b2c481b598788c2
Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used GPT-4o to paraphrase Use t5-base to attribute each sentence

IDACCS-nugg-gpt-4-1¶

Run ID: IDACCS-nugg-gpt-4-1
Participant: IDACCS
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 7b5b6e2c9a51feb6505305cadc863ae8
Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used gpt-4.1 to generate a nugget from the extracted sentence Use t5-base to attribute each sentence

IDACCS-nugg-gpt-4o¶

Run ID: IDACCS-nugg-gpt-4o
Participant: IDACCS
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 639808c64fc9a2d0e9f02f48d769fa24
Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used GPT-4.0 to paraphrase Use t5-base to attribute each sentence

IDACCSabstrct-gpt4-1¶

Run ID: IDACCSabstrct-gpt4-1
Participant: IDACCS
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 1aa0f9d01de1b02dbdfac530910f8414
Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Used GPT-4.1 to write an abstractive summary Use t5-base to attribute each sentence

jcru-ablR¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: jcru-ablR
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-qrels
MD5: a0aa4a4ee8a47d2ef3b6baba6cc90d93
Run description: Crucible@rag25

Original run tag: ffiltered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Filtering with argue_eval.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerabilityExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. The frequency with which a cited document is used for sentences is used as relevance score.

jcru-ablR-all¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: jcru-ablR-all
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-qrels
MD5: b49ea2817a74e877a0a9cc0c5c8e2e68
Run description: Crucible@rag25

Original run tag: crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. No filtering with argue_eval.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerabilityExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct The frequency with which a cited document is used for sentences is used as relevance score.

jcru-ansR¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: jcru-ansR
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-qrels
MD5: 385dff14abf1b03d2d804902b0be7543
Run description: Crucible@rag25

Original run tag: filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Filtering with argue_eval.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. The frequency with which a cited document is used for sentences is used as relevance score.

jcru-ansR-all¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: jcru-ansR-all
Participant: HLTCOE
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-qrels
MD5: cc310d2aa1f38bd93d33a25b71c447fb
Run description: Crucible@rag25

Original run tag: crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. No filtering with argue_eval.

Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct The frequency with which a cited document is used for sentences is used as relevance score.

KG-AG-1¶

Run ID: KG-AG-1
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 443aa290bb605246088ca9aae7a8448c
Run description: Retrieval:
Use RFF to the result of bm25 (k=10000) and dense embedding (k=10000) to get the hybrid result (k=1000).
Use cross-encoder to rerank the retrieval result to k=300.

Generation: - Construct KG. - Run triple based ToG to get the reasoning paths on KG, and used the collected results to generate ver.1 answer. - Generate ver.2 answer by using the selected segments from the above reasoning paths to refine the answer.

Kun-Third¶

Run ID: Kun-Third
Participant: RMIT-IR
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 84bd82122b799e8eddbec2fc7bdafca7
Run description: For this run, we used query decomposition + per query re-rank + context words limit 30,000. The reranking uses Falcon-10b. The decomposation and generation use Cluade Sonnet 4.

LAS-agentic-RAG-agent¶

Run ID: LAS-agentic-RAG-agent
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-15
Task: trec2025-rag-generation
MD5: 0b2db2477846a385c03173f13adf4f1d
Run description: This pipeline uses a single RAG agent that is instructed to decompose the question into separate queries, retrieve relevant information from a msmarco segment index, and create a report answering the question. An additional process is used to create citations using similarity matching between sentences in the answer and sentences in the retrieved context.

LAS-agentic-RAG-selector¶

Run ID: LAS-agentic-RAG-selector
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-15
Task: trec2025-rag-generation
MD5: fd102c64048ec379245e3ce859a591a1
Run description: This pipeline uses multiple LLM-enabled agents including a selector, a planner, a research assistant, a writer, and a reviewer. The selector chooses the next agent, always choosing the planner first. The planner creates a research plan and a report plan. The research assistant uses a retrieval tool to query the msmarco segment index and compose a list of relevant information. The research assistant can perform multiple queries simultaneously and decides itself how to phrase a query. The writer takes the list of relevant information and the instructions from the planner and creates a draft report. The reviewer analyzes the report to ensure consistency with the source information and that it fully answers the users questions. It is instructed to ask for changes/updates after the first draft, after which the process repeats until the reviewer and the planner have decided the submission is complete.

LAS_con-que¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: LAS_con-que
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 752b2fb3aa3e52203782aa0d689e6707
Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query.

LAS_con-que-con-nug¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: LAS_con-que-con-nug
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 1259f67ba4dad08d535d99bfbe493220
Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query. Use castorini/nuggetizer to generate nuggets based on the top 20 results. Nugget creation prompt was modified to generate self-contained 10-20 words nuggets. Concatenate "vital" nuggets into a big query. Search SPLADE-v3 segment index using the big query.

LAS_con-que-sep-nug¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: LAS_con-que-sep-nug
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 5cb876f4b831788bffc023a1f264a012
Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query. Use castorini/nuggetizer to generate nuggets based on the top 20 results. Nugget creation prompt was modified to generate self-contained 10-20 words nuggets. Search each "vital" nuggets in SPLADE-v3 segment index. Merge result list using RRF, k = 10.

LAS_sep-que¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: LAS_sep-que
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 377ad3d6aa8020894ca3c8fce93cb499
Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Search each of these questions in SPLADE-v3 segment index. Merge result list using RRF, k = 10.

lg_nt_q4d12l3¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: lg_nt_q4d12l3
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 345e5ea0a643b8745c251059d248f90c
Run description: This run leverages the langraph framework. In a round, the approach produces a set of 4 queries. Notetaking is done on the top 12 documents for each query. The notes from documents retrieved with a single query are used to generate a partial answer. Partial answers from all queries are examined for completeness. If the answer is deemed to be incomplete, up to 4 new queries are proposed to fill knowledge gaps. At the completion of at most 3 rounds, and answer is drafted and then shortened to fit the length limit. All documents are retrieved using splade v3. All llm calls are serviced with llama-3.3-70B-instruct.

lg_nt_q4d12l3_c¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: lg_nt_q4d12l3_c
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: c23af45912900b5335f21bd8e5cfe91a
Run description: This run leverages the langraph framework. In a round, the approach produces a set of 4 queries. Notetaking is done on the top 12 documents for each query. The notes from documents retrieved with a single query are used to generate a partial answer. Partial answers from all queries are examined for completeness. If the answer is deemed to be incomplete, up to 4 new queries are proposed to fill knowledge gaps. At the completion of at most 3 rounds, and answer is drafted and then shortened to fit the length limit. Then each citation is checked to see that it supports the sentence. Unfaithful citations are removed. If a substitute can be found, another document is used instead. Otherwise, the sentence is removed. All documents are retrieved using splade v3. All llm calls are serviced with llama-3.3-70B-instruct.

lucerank¶

Participants | Input | trec_eval | Appendix

Run ID: lucerank
Participant: digsci
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 9ce1dee3b819b81f8cf7e00ffec0f84c
Run description: Lucerank is a reranking strategy that leverages highly parallelized LLM calls (gpt‑4.1‑mini in this case) on small random subsets of candidates, and then aggregates them via Luce Spectral Ranking to produce calibrated relevance scores.

NITA-Qrels¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: NITA-Qrels
Participant: NIT Agartala
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-10
Task: trec2025-rag-qrels
MD5: 7eaafe85a5925da0e2716d0ee9544b9e
Run description: This run uses a multi-stage retrieval pipeline. Candidate documents are first taken from a baseline dense retrieval run (top-100), then restricted to the top-20 per query. These candidates are reranked using the BAAI/bge-reranker-large cross-encoder model, which outputs confidence scores mapped into 5-level relevance judgments (0–4). Final results are written in TREC QREL TSV format with per-query top-20 judgments.

NITA_AG_JH¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: NITA_AG_JH
Participant: NITATREC
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 8cd67655c083cda87bdc498537aa11aa
Run description: This run employed Falcon-7B-Instruct (tiiuae/falcon-7b-instruct) for answer generation. For each query, the top 20 passages retrieved by the baseline system were provided to the model, and outputs were generated using deterministic decoding (do_sample=False, temperature=0.0, max_new_tokens=400) to ensure factual precision and reduce hallucination. The generated responses were post-processed through sentence segmentation, citation extraction, and formatting into TREC Format-2 JSON with metadata, references, and citation-linked answers.

NITA_R_DPR¶

Participants | Input | trec_eval | Appendix

Run ID: NITA_R_DPR
Participant: NITATREC
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: ccc04756c4dd5653238a81a52c3d6b97
Run description: This submission implements a three-stage hybrid retrieval pipeline for TREC RAG 2025. Stage 1 utilizes pre-computed BM25 results to select the top 500 lexical candidates per query. Stage 2 applies DPR semantic filtering using Facebook's dpr-question_encoder-single-nq-base and dpr-ctx_encoder-single-nq-base models to reduce candidates to the top 200 based on cosine similarity of dense embeddings. Stage 3 performs neural reranking with cross-encoder/ms-marco-MiniLM-L-12-v2 for final relevance scoring. The system processes 105 queries with GPU batch inference (batch size 256), delivering top-100 results that combine lexical matching with semantic understanding through transformer-based architectures.

NITA_R_JH_HY¶

Participants | Input | trec_eval | Appendix

Run ID: NITA_R_JH_HY
Participant: NITATREC
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 06f5a6f550fe876ed20740b6bf5889e6
Run description: The system implements a hybrid retrieval and reranking pipeline for TREC 2025 RAG. In the first stage, candidate documents are retrieved using both sparse and dense retrieval methods: BM25 (via Pyserini) retrieves the top 1000 documents per query, while Dense Passage Retrieval (DPR) leverages facebook/dpr-question_encoder-single-nq-base for queries and facebook/dpr-ctx_encoder-single-nq-base for documents to retrieve the top 500 candidates. The resulting sets are fused, ensuring unique documents across both retrievers. In the second stage, a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-12-v2) scores each query-document pair to produce fine-grained relevance rankings. Documents are then sorted by cross-encoder scores, with the top 100 per query output in the TREC run file format. The pipeline utilizes GPU acceleration for DPR embedding generation and cross-encoder inference, optimizing retrieval efficiency and enabling scalable reranking across large candidate sets.

no-decomp¶

Run ID: no-decomp
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 988c991e74720be034c70161240e6b24
Run description: Splade -> Qwen3-Reranker-8B -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)

no-decomp-reranker¶

Run ID: no-decomp-reranker
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 8b5f85ae73937f608f3d5a9db488a07d
Run description: Splade -> set-wise passage selection (gemma-3-27b-it) -> generation (gpt-5)

We consider set-wise passage selection to be a part of the generation step.

no-llm¶

Run ID: no-llm
Participant: WING-II
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: ab582af222cb06e728a58ae2ff276730
Run description: greedy submodular segment selection → rule-based evidence-card compression → heuristic claim assembly → lexical-IDF self-check & citation repair → hard post-fix (length/indices)

no-llm-refined¶

Run ID: no-llm-refined
Participant: WING-II
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 55f7d2b3e3c4257a8f165aa8b879b129
Run description: greedy submodular segment selection → rule-based evidence-card compression → heuristic claim assembly → lexical-IDF self-check & citation repair → hard post-fix (length/indices) → LLM refinement

no-reranker¶

Run ID: no-reranker
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: a9926f8d0b52e892f34e5d9ecb91e24a
Run description: Splade -> query decomposition (gemma-3-27b-it) -> reciprocal rank fusion -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)

norm_nugget_cnt¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: norm_nugget_cnt
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-25
Task: trec2025-rag-qrels
MD5: 2aa20d80af72bd51bee89b85028bebce
Run description: Length normalized count of nuggets

nugget-generation¶

Run ID: nugget-generation
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 046c0f55a01ab9c78d03cc4ab0763db7
Run description: Nugget generation from top 20 passages, and then response generation from these nuggets. Using gpt4o.

nugget_cnt¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: nugget_cnt
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-25
Task: trec2025-rag-qrels
MD5: bcc93d0687e51c7b9306307a40f79b07
Run description: Count of nuggets

nuggetizer¶

Run ID: nuggetizer
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: ec28606d64e1a5bd5c7023ebc9dc160c
Run description: Nuggetizer

ori_query_entities¶

Run ID: ori_query_entities
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 1d24b88f389e03050d2880b7908c66e6
Run description: Preprocessing: I first use an LLM to extract entities from each sentence in the segments. Then, for every entity, I use an embedding model to retrieve the top 10 most relevant sentences from segments as its description.

Retrieval Task: For each query, I compute the relevance between the query and every entity description using an embedding model. If the same segment ID appears under multiple entities, I keep the highest score among them as the score for that segment ID. This score is used to rank the segments for retrieval.

AG Task: For each query, I take the retrieved entities and send their descriptions to an LLM to generate an answer. The answer is split into sentences. For each sentence, I again use the embedding model to retrieve the most relevant entity, and the segment IDs linked to that entity are used as the citation for that sentence.

Qwen3-30B-Instruct¶

Participants | Input | qrel_eval | Appendix

Run ID: Qwen3-30B-Instruct
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: ddcd80c89ed42a9ac2d1e099ecde312d
Run description: Using UMBRELA variant with Qwen3-30B-A3B-Instruct-2507

Qwen3-30B-Think¶

Participants | Input | qrel_eval | Appendix

Run ID: Qwen3-30B-Think
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: 93ce8f1b76b99b2a81e11d730f7bd979
Run description: Using UMBRELA variant with Qwen3-30B-A3B-Thinking-2507

Qwen3-30BInstruct-sn¶

Participants | Input | qrel_eval | Appendix

Run ID: Qwen3-30BInstruct-sn
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: d47142d1d1ce84bbd679c65a8f465e32
Run description: Creates sub-narratives from the narratives (Qwen3-30B-A3B-Instruct-2507) -> Uses the list of sub-narratives to allocate relevance labels (Qwen3-30B-A3B-Instruct-2507)

Qwen3-30BThink-sn¶

Participants | Input | qrel_eval | Appendix

Run ID: Qwen3-30BThink-sn
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-09-11
Task: trec2025-rag-qrels
MD5: cbf3fc5449f7626528c106ba8834d0fb
Run description: Creates sub-narratives from the narratives (Qwen3-30B-A3B-Thinking-2507) -> Uses the list of sub-narratives to allocate relevance labels (Qwen3-30B-A3B-Thinking-2507)

qwen_splade¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: qwen_splade
Participant: UTokyo
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: 704911d252b742856c1980552e4b65f7
Run description: Hybrid retrieval pipeline leveraging HyDE (Hypothetical Document Embeddings) with Qwen3-Embedding-0.6B for dense retrieval (query:HyDE 0.3:0.7 weighted combination) and SPLADE sparse retrieval. Results are fused using Reciprocal Rank Fusion (RRF, k=60) to combine complementary signals from dense and sparse methods. Final ranking performed by GPT-4.1-mini with sliding window reranking (window=10, stride=5, 3 passes) incorporating document title, URL, and segment content for enhanced contextual understanding.

r_2method_ag_gpt41¶

Run ID: r_2method_ag_gpt41
Participant: UTokyo
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: 25678ddbacd6bd95f92a45625b8d7961
Run description: Complete RAG pipeline integrating hybrid retrieval (Qwen3-0.6B HyDE dense + SPLADE sparse with RRF fusion, GPT-4.1-mini reranking) with GPT-4.1 generation. The retrieval stage employs HyDE-enhanced dense search with query:HyDE 0.3:0.7 weighting, complemented by SPLADE's learned sparse representations. After RRF fusion (k=60), GPT-4.1-mini performs listwise reranking using sliding windows (w=10, s=5) with enriched context (title+URL+segment). The generation stage uses GPT-4.1 with Ragnarök baseline prompting to synthesize coherent answers from top-ranked passages.

r_4method_ag_gpt41¶

Run ID: r_4method_ag_gpt41
Participant: UTokyo
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: ead3a28a44e2cdbfafeb9eab605f864d
Run description: Full-scale RAG system integrating a sophisticated 4-method retrieval ensemble with GPT-4.1 generation. Retrieval combines dual dense approaches (Qwen3-0.6B and BGE-small-en-v1.5, both using HyDE with GPT-4.1-generated hypothetical answers, query:HyDE 0.3:0.7 weighted) and dual sparse methods (SPLADE's contextualized sparse representations and BM25 enhanced with GPT-4.1 keyword expansion producing 10 relevant terms per query). The 4000 total candidates (1000 per method) undergo RRF fusion (k=60) to create a unified ranking, followed by GPT-4.1-mini listwise reranking with sliding windows (w=10, s=5, 3 passes) using enriched context. Generation employs GPT-4.1 with Ragnarök baseline prompting, processing the top-reranked passages to produce comprehensive, grounded answers.

rag-v1-gpt¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag-v1-gpt
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: fb1fd9638bd669d708ae40d47ae975c5
Run description: The workflow combines retrieval, reranking, sub-query answer generation, and final article integration. It first acquires the top 100 documents via a multi-stage retrieval pipeline. For each main query, decomposed sub-queries are retrieved for the top-10 segments, concise factual answers are generated, and citation support is analyzed with the LLM (gpt-4.1-mini). All sub-answers are ordered and refined by the LLM, and output as a structured, multi-paragraph answer in JSON format with citations, ensuring clear logic, traceable sources, precision, and structure.

rag-v2-gpt¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag-v2-gpt
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: de4b196db4f37dc19e8b01388a47e56d
Run description: This workflow integrates retrieval, reranking, sub-query answer generation, and final article assembly. It first obtains the top 100 documents via a multi-stage retrieval pipeline. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, generates concise sub-query answers, and finally the LLM (gpt-4.1-mini) integrates them into a complete article with citation markers, splitting the output into sentence-level answers. The final output is in JSON format, including query information, total word count, citation list, and citation-annotated answers, fully presenting the retrieval-to-answer process.

rag-v2-llama¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag-v2-llama
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: cacb1ed600648c827e02138ae675c188
Run description: This workflow combines retrieval, reranking, sub-query answer generation, and final article assembly. First, it acquires the top 100 documents through a multi-stage retrieval pipeline. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, concise sub-query answers are generated, and the LLM (Llama 3.1 8b) merges them into a complete article with citation markers, splitting the output into sentence-level answers. The final result is output in JSON format, including query information, total word count, citation list, and citation-annotated answers—fully presenting the workflow from retrieval to answer generation.

rag25_qwen3_20_ag¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag25_qwen3_20_ag
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: f96387a177e0f89646c80d591ce7be72
Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-20): Qwen3

rag25_qwen3_50_ag¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag25_qwen3_50_ag
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-auggen
MD5: 6bfa165adf69fbd066db545dc975eaf7
Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-50): Qwen3

rag25_test_arctic-l¶

Participants | Input | trec_eval | Appendix

Run ID: rag25_test_arctic-l
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 5c1da23ebd3a67b2af93c77ec3ba27b7
Run description: Uses Snowflake's Arctic embed l

rag25_test_arctic-m¶

Participants | Input | trec_eval | Appendix

Run ID: rag25_test_arctic-m
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 2bf6ca5361029893ea836bad041fb09c
Run description: Uses Snowflake's Arctic embed m

rag25_test_qwen3_20¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag25_test_qwen3_20
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: f96387a177e0f89646c80d591ce7be72
Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-20): Qwen3

rag25_test_qwen3_50¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag25_test_qwen3_50
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 6bfa165adf69fbd066db545dc975eaf7
Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-50): Qwen3

rag25_test_rankqwen3¶

Participants | Input | trec_eval | Appendix

Run ID: rag25_test_rankqwen3
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 9f33d21b14911991ff8a801f40b4d5fb
Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen

rag25_test_splade-v3¶

Participants | Input | trec_eval | Appendix

Run ID: rag25_test_splade-v3
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 76452dc5a3457d0b62a157faffeb9dae
Run description: Anserini Splade v3

rag_v4¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: rag_v4
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: b40ca1291d8f3ad5f2728462ae815d9d
Run description: We submit a retrieval-augmented generation run using precomputed retrieval (TREC runfile) over the MSMARCO v2.1 segmented corpus. For each decomposed subquery, we feed the top-k passages to OpenAI gpt-4.1-mini with strict, evidence-only prompts that return JSON {answer, citations} with exact # passage IDs; we then do a short coherence rewrite (fixed-length array) and reattach filtered citations. No open-weight models are used; low temperature and light post-processing (dedupe, min/max citations) ensure stable, submission-ready outputs.

Rerank-Top50_v2¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: Rerank-Top50_v2
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 3c269f2ce9ca4bb9319349e3577234c3
Run description: use LLM to rerank Top 50 segment and generate answer based on it

Rerank-Top50_v3¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: Rerank-Top50_v3
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: a4b41843a612e26a9aed8a4e0a0388be
Run description: v3

ret-gemma¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ret-gemma
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: cec1aeca5ca65c7c527771d915a46e12
Run description: Splade -> query decomposition (gemma-3-27b-it) -> reranker (gemma-3-27b-it) -> reciprocal rank fusion

ret-no-decomp¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ret-no-decomp
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: f67ae4ceef9ffb7d1fdae49a5707e5f7
Run description: Splade -> Qwen3-Reranker-8B

ret-no-reranker¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ret-no-reranker
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 638cf3ec7378bfdec87aedff7c107736
Run description: Splade -> query decomposition (gemma-3-27b-it) -> reciprocal rank fusion

ret-splade-only¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ret-splade-only
Participant: MITLL
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: ee266323e5527bc2090d5c26014d60a9
Run description: Splade only.

ronly_auto_plan¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ronly_auto_plan
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 392dcada37cae6ae1b5e8d35f3a196f4
Run description: Automatically planning

ronly_auto_selected¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ronly_auto_selected
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: b534614227e0581b3014aa80ea33fb8c
Run description: Backward from answer to retrieval

ronly_combined¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ronly_combined
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 63b3e7fdd6a4a31b130ab333a876afed
Run description: Combining retrieved passages from multiple techniques + reranking

ronly_garag¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ronly_garag
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 72fbbaa42d64459ecafb1de50f90fb34
Run description: Generate first

ronly_nuggetizer¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: ronly_nuggetizer
Participant: WaterlooClarke
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-retrieval
MD5: 879d76a29c54d70c7e28eb4c6d9251cc
Run description: Mono + Duo

RRF_all¶

Participants | Input | trec_eval | Appendix

Run ID: RRF_all
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: 9daf535a729df579814cbc8708a9d334
Run description: A hybrid retrieval pipeline combining BM25 with query expansion, dense embedding search, and neural rerankers (ColBERTv2, MiniLM). Results are fused via RRF, with the top-100 outputs submitted.

RRF_colbert_minlm¶

Participants | Input | trec_eval | Appendix

Run ID: RRF_colbert_minlm
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 760a2c60620cdef5c12ae35ca4427035
Run description: This run combines ColBERTv2 results with MiniLM cross-encoder reranking using Reciprocal Rank Fusion (RRF). Final ranking selects the top 100 documents per query.

RRF_colert_bm25¶

Participants | Input | trec_eval | Appendix

Run ID: RRF_colert_bm25
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: 77f1d6d3176ad88c774280a2820fb252
Run description: A multi-stage automatic retrieval run combining BM25 with ColBERT reranking, fused via RRF, and outputting the top-100 results.

RRF_minilm_bm25¶

Participants | Input | trec_eval | Appendix

Run ID: RRF_minilm_bm25
Participant: cfdalab
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-retrieval
MD5: 33fc6e836bc70a27b816a528276b7217
Run description: A multi-stage retrieval run combining BM25 with MiniLM pointwise reranking, fused via RRF, with top-100 results submitted.

selector-agent-trim¶

Run ID: selector-agent-trim
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: 888b35018bdd101e3834756e2854ed9b
Run description: This run uses a selector group chat. The selector is gpt-4.1-mini. It assigns each message turn choosing between a planner, a research assistant, a writer, and a reviewer. The planner and reviewer use gpt-4.1 and the research assistant and writer use gpt-4.1-mini. The output is trimmed using organizer's script and reformatted to format 1.

sentence-transformers-all-MiniLM-L6-v2¶

Participants | Input | trec_eval | Appendix

Run ID: sentence-transformers-all-MiniLM-L6-v2
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-15
Task: trec2025-rag-retrieval
MD5: a9e385d6990cb1b68f78a6eb97ac5d87
Run description: sentence-transformers/all-MiniLM-L6-v2 retrieve for original queries.

single-agent-trim¶

Run ID: single-agent-trim
Participant: ncsu-las
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-16
Task: trec2025-rag-generation
MD5: fb58466227615672ecdf86b3a4670ca1
Run description: This run uses a single agent to decompose the user input into multiple queries, calls the search_splade tool, and writes a report. The output is trimmed using organizer's script and reformatted to format 1.

splade-v3-arctic-l¶

Participants | Input | trec_eval | Appendix

Run ID: splade-v3-arctic-l
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 385a0ba9bc099487742d0587d8d97517
Run description: Stage 1: RRF(Splade v3, Snowflake's Arctic embed l)

splade-v3-arctic-m¶

Participants | Input | trec_eval | Appendix

Run ID: splade-v3-arctic-m
Participant: coordinators
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 69dccf5941af099abf559bde3638c188
Run description: Stage 1: RRF(Splade v3, Snowflake's Arctic embed m 1.5)

standard_roll¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: standard_roll
Participant: IRIT-ISIR-EV
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-20
Task: trec2025-rag-generation
MD5: 92f0bf95bf0d003d05205490f7f81110
Run description: This run uses the full document as input instead of the segment

strd_roll_segment¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: strd_roll_segment
Participant: IRIT-ISIR-EV
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 3dbb7b4b8894cfb045327708531795b1
Run description: This run leverages segments from input documents instead of the full document

sub_query_entities¶

Run ID: sub_query_entities
Participant: clip2025
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: eb8344cfce7dc3e3545bb3471cad05ef
Run description: Preprocessing: I first use an LLM to extract entities from each sentence in the segments. Then, for every entity, I use an embedding model to retrieve the top 10 most relevant sentences from segments as its description.

Retrieval Task: For each query, I compute the relevance between the query and those sub query and every entity description using an embedding model. If the same segment ID appears under multiple entities, I keep the highest score among them as the score for that segment ID. This score is used to rank the segments for retrieval.

AG Task: For each query, I take the retrieved entities and send their descriptions to an LLM to generate an answer. The answer is split into sentences. For each sentence, I again use the embedding model to retrieve the most relevant entity, and the segment IDs linked to that entity are used as the citation for that sentence.

swarm¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: swarm
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: f86a5f6c46598e836d11f82928eb2b71
Run description: This run leverages the AutoGen framework. Multiple agents work on the answer generation: a query decomposition agent, document retrieval agent, report writing agent, report editing agent, and a report publishing agent. The query decomposer creates 3 queries at a time. Retrieval is done with splade V3 and the top 6 documents are included. The document retrival agent uses a fact extractor agent to identify key parts of documents. The report writing agent removes redundant facts. The report editing agent compiles the facts into an answer and determines whether more information is needed or the answer is ready to be published. After publishing an independent editor is used to fit the answer into the length limit. All llm calls are serviced with llama-3.3-70B-instruct.

swarm_c¶

Participants | Input | nist-post-edit | nuggetizer_eval | Appendix

Run ID: swarm_c
Participant: hltcoe-multiagt
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 071a22280dcc7a4be911174b0d8be951
Run description: This run leverages the AutoGen framework. Multiple agents work on the answer generation: a query decomposition agent, document retrieval agent, report writing agent, report editing agent, and a report publishing agent. The query decomposer creates 3 queries at a time. Retrieval is done with splade V3 and the top 6 documents are included. The document retrival agent uses a fact extractor agent to identify key parts of documents. The report writing agent removes redundant facts. The report editing agent compiles the facts into an answer and determines whether more information is needed or the answer is ready to be published. After publishing an independent editor is used to ensure citation accuracy and fit the answer into the length limit. All llm calls are serviced with llama-3.3-70B-instruct.

uema2lab_B4¶

Run ID: uema2lab_B4
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-17
Task: trec2025-rag-generation
MD5: 1e3cba12b9251d0b944b7435f5a1470d
Run description: This run decomposes the narrative query into sub-queries. These sub-queries are then used in a hybrid retrieval system, combining dense and sparse methods, to gather relevant documents for the LLM. Subsequently, we leverage the LLM to score each document's relevance against the sub-queries. The documents are then ranked and provided to the LLM in descending order of their scores, and the prompt explicitly instructs the model that the documents are sorted by relevance to inform the final answer generation.

uema2lab_base¶

Run ID: uema2lab_base
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 8a02790c4007ca840c10ba387954ffd5
Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt a hybrid document-level retrieval pipeline that combines BM25-based sparse retrieval with dense vector search using OpenAI embeddings. To improve the relevance and diversity of retrieved results, we decomposed original queries into several subqueries. For each sub-queries, we retrieve a few top-K candidate documents. The top-k documents are selected based and used as input to a gemini pro 1.5 LLM to generate answers. We also assigned segementation IDs after the answer found out.

uema2lab_narrative¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: uema2lab_narrative
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 9f2b77fa772b1a7e21927d3b90ca242a
Run description: We applied a hybrid search combining BM25 and dense retrieval (OpenAI text-embedding-3-small, 1536-dim) to retrieve the top-20 documents per narrative. From each document, the segment most similar in embedding space to the narrative text was selected, and the ranked document lists were thus converted into segment-level lists for evaluation. This run serves as a comparison baseline to the subquery-expansion approach. The results are compared against other runs (runid: uema2lab_rrf, uema2lab_rrf_k10, uema2lab_segment).

uema2lab_rag_fewdoc¶

Run ID: uema2lab_rag_fewdoc
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 6db979087469a2f2aadfdfc53e594e38
Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt organizer's baseline retrieval to our baseline system to compare with effectiveness of subquery decompositions. Other AG part is the same as our baseline system described in the different run.

uema2lab_rag_org¶

Run ID: uema2lab_rag_org
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-generation
MD5: 80eeb2f05334fdbd10fe7fb304ffa3d5
Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt organizer's baseline retrieval to our baseline system to compare with effectiveness of subquery decompositions. Other AG part is the same as our baseline system described in the different run.

uema2lab_rrf¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: uema2lab_rrf
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 2ce3ffd6cfd2d2f833d91f5eccb287ec
Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index, and the top 100 documents were retrieved. These ranked lists for each sub-query were then merged using Reciprocal Rank Fusion (RRF) with rrf_K=60 to produce a final ranked list of documents at the narrative-query level. Each document in the ranked list was segmented into smaller textual segments, and each segment was considered a candidate. We computed embedding-based similarity between each segment and the original narrative query, and selected the most relevant segment for each document. This final ranked list of segments was submitted as the run.

uema2lab_rrf_k10¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: uema2lab_rrf_k10
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: 3916923664d2611e7a47c1746f954b95
Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index, and the top 100 documents were retrieved. These ranked lists for each sub-query were then merged using Reciprocal Rank Fusion (RRF) with rrf_K=10 to produce a final ranked list of documents at the narrative-query level. Each document in the ranked list was segmented into smaller textual segments, and each segment was considered a candidate. We computed embedding-based similarity between each segment and the original narrative query, and selected the most relevant segment for each document. This final ranked list of segments was submitted as the run.

uema2lab_segment¶

Participants | Proceedings | Input | trec_eval | Appendix

Run ID: uema2lab_segment
Participant: tus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-retrieval
MD5: b41156614eed6c7d28458d3c518cc59b
Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index. The retrieval results were then fused using Reciprocal Rank Fusion (RRF) to produce a ranked list of 100 documents per sub-query. These sub-query-level ranked lists were further merged using RRF at the narrative-query level to produce a final ranked list of documents. Next, each document in the ranked list was segmented into smaller textual segments. For each segment, we computed the embedding-based similarity to the original narrative query, and selected the most relevant segment per document. The final ranked list of segments was submitted as the run.

unique_cluster_cnt¶

Participants | Proceedings | Input | qrel_eval | Appendix

Run ID: unique_cluster_cnt
Participant: GenAIus
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-25
Task: trec2025-rag-qrels
MD5: 1898d306ee979ef708a9ab1f77c17681
Run description: Count of unique clusters

wingii-3-rl-refined¶

Run ID: wingii-3-rl-refined
Participant: WING-II
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: 3d843c728caf14fccc60724f91fb5e54
Run description: Submodular evidence selection (K=24) + small-model evidence-card compression + citation-first claims with strict JSON output, followed by a lexical self-check/post-fixer (citation strengthening, length control) + Refinement.

wingii-v3-gpt¶

Run ID: wingii-v3-gpt
Participant: WING-II
Track: Retrieval Augmented Generation (RAG)
Year: 2025
Submission: 2025-08-18
Task: trec2025-rag-auggen
MD5: c3ad8f01fa37643990a9e515966b4e5a
Run description: Submodular evidence selection (K=24) + small-model evidence-card compression + citation-first claims with strict JSON output, followed by a lexical self-check/post-fixer (citation strengthening, length control).