Runs - Retrieval Augmented Generation (RAG) 2025¶
4method_merge¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: 4method_merge
- Participant: UTokyo
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
d1439f4e4e92c4bfb292c7e5d4b3e794 - Run description: Comprehensive 4-method hybrid retrieval system combining two dense retrievers (Qwen3-0.6B and BGE-small-en-v1.5, both HyDE-enhanced with query:HyDE 0.3:0.7 weighting) and two sparse methods (SPLADE learned sparse representations and BM25 with GPT-4.1-generated keyword expansion). All four retrieval streams produce top-1000 results that are fused using Reciprocal Rank Fusion (RRF, k=60) to leverage diverse relevance signals. Final ranking performed by GPT-4.1-mini using sliding window reranking (window=10, stride=5, 3 passes) with enriched document context including title, URL, and segment content for improved relevance assessment.
ag-run-1-JH¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: ag-run-1-JH
- Participant: NITATREC
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
8cd67655c083cda87bdc498537aa11aa - Run description: For each query from trec_rag_2025_queries.jsonl, we loaded its precomputed candidate list from retrieve_results_rankqwen3_32b.rag25_top100.jsonl and selected the top-20 passages in rank order. We then constructed a citation-aware prompt that injected the top-k context passages (k=8), each annotated with its docid, and instructed the model to answer strictly from the provided context, require every sentence to end with [CITATION: docid], and remain under 400 words. Answer generation used Falcon-7B-Instruct (tiiuae/falcon-7b-instruct) via the Hugging Face text-generation pipeline with deterministic decoding (do_sample=False, temperature=0.0, max_new_tokens=400). Finally, for each query, we emitted a TREC Format-2 record containing metadata (including the exact prompt), the list of 20 reference docids, and the sentence-level answer with citations.
ag-v1-gpt¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: ag-v1-gpt
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-auggen
- MD5:
885d02a797eebaf7cdea36680faa8807 - Run description: The execution process automatically integrates retrieval, reranking, summary generation, citation validation, paragraph reorganization, and linguistic refinement. For each main query, decomposed sub-queries are individually retrieved for the top-10 document segments. Concise factual answers are generated based on the retrieved content, with LLM analysis on citation support. All sub-answers are then reordered and refined by the LLM (gpt-4.1-mini), and finally output as a structured, multi-paragraph answer in JSON format with source citations—ensuring clear logic, traceable provenance, precise content, and rigorous structure.
ag-v2-gpt¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: ag-v2-gpt
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-auggen
- MD5:
9f8f0932ad1b7c37f492d5d6b417a8ec - Run description: This process combines sub-query answer generation and final article integration. For each query ID, it first uses a cross-encoder to retrieve the top-10 documents per sub-query, then generates concise sub-query answers. Finally, the LLM (gpt-4.1-mini) integrates answers into a complete article with citation markers, splitting the result into sentence-level outputs. The final results are output in JSON format, including query information, total word count, citation list, and answers with citation annotations, fully presenting the entire retrieval-to-generation pipeline.
ag-v2-llama¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: ag-v2-llama
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-auggen
- MD5:
f96ae0890586a3fb3922787716fb2ab9 - Run description: This process combines sub-query answer generation and final article integration. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, concise sub-query answers are generated, and the LLM (Llama 3.1 8b) merges answers into a complete article with citation markers, splitting the output into sentence-level answers. The final results are output in JSON format, containing query details, total word count, citation list, and citation-annotated answers, comprehensively presenting the entire process from retrieval to answer generation.
Anserini_bm25_only¶
Participants | Input | trec_eval | Appendix
- Run ID: Anserini_bm25_only
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
787ed782433ec3ed69943f6703e46293 - Run description: A standard BM25 retrieval baseline run using Anserini on the MS MARCO v2.1 segmented document collection
anserini_bm25_top100¶
Participants | Input | trec_eval | Appendix
- Run ID: anserini_bm25_top100
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
f55e28ed7f78b33ef806a9016e89d02a - Run description: Anserini BM25
auto_plan¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: auto_plan
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
3b135860890f8a45af1540f9391915f3 - Run description: Automatically planning
auto_selected¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: auto_selected
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
15a50672b8dfa9fba10e6686ea02d51f - Run description: Automatically select from multiple generations
bm25-rz7b-2025a¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: bm25-rz7b-2025a
- Participant: ii_research
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
ca4efd3ec0416a4daf40ce5a3ac05457 - Run description: BM25 on MS MARCO v2.1 retrieves up to 1000 candidates per query. We then apply windowed listwise reranking with the open-weight 7B model castorini/rank_zephyr_7b_v1_full to obtain the Top-100. The Top-K segments together with the narrative are fed to a ReClaim-style generator built on meta-llama/Llama-3.1-8B-Instruct with open-weight fine-tuned heads to produce claim-reference pairs. Finally, we post-process with sentence scoring, deduplication, and a word budget filter.
bm25_NITA_JH¶
Participants | Input | trec_eval | Appendix
- Run ID: bm25_NITA_JH
- Participant: NITATREC
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
db8e9989f77294a30de47a8374a279d0 - Run description: This run applies a BM25 retrieval pipeline using Pyserini over the MS MARCO v2.1 segmented corpus. A Lucene index was constructed with positional information, document vectors, and raw text storage enabled, and queries were preprocessed into TSV format for compatibility. Retrieval was performed with BM25 (k1=1.2, b=0.75), returning the top-100 ranked segments per query, and outputs were generated in standard TREC run file format.
bm25_rocchio_top100¶
Participants | Input | trec_eval | Appendix
- Run ID: bm25_rocchio_top100
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
3b5aba968699f34e9c93179564caed5a - Run description: Anserini BM25 + Rocchio
citation_cnt¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: citation_cnt
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-25
- Task: trec2025-rag-qrels
- MD5:
58125c710120e908826e7a99cc23516c - Run description: Count of citations
cluster-generation¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cluster-generation
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
a9e461d94b248e83c30d480da5782267 - Run description: Nugget generation from top 20 passages, clustering generated nuggets with llm and then response generation from these nuggets. Using gpt4o.
cluster_cnt¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: cluster_cnt
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-25
- Task: trec2025-rag-qrels
- MD5:
af3fb3096f97cdb78ffdb5274e8ae7ea - Run description: Count of clusters
combined¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: combined
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
ee3a8ed30ad177c052823b4235e03c5a - Run description: Combining retrieved passages from multiple sources + Nuggetizer
cru-ablR¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cru-ablR
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
b17948c3465fa47041dfc0489f9a3b02 - Run description: Crucible@rag25
Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Will LLM judges generalize across tracks?
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18
cru-ablR-conf¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cru-ablR-conf
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
b7361ce87f4c5e5da5fa139201ba13aa - Run description: Crucible@rag25
Original run tag: strict-filtered-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Just check citation support, rely on extraction confidence.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18
cru-ansR¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cru-ansR
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
83128bf22b6e41f4b332ad5863443373 - Run description: Crucible@rag25
Original run tag: strict-filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Will LLM judges generalize across tracks?
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18
cru-ansR-bareconf¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cru-ansR-bareconf
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
508155b32b20c94d85453d9008ebca52 - Run description: Crucible@rag25
Original run tag: strict-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Just rely on extraction confidence.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct No sentence filtering with argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18
cru-ansR-conf¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: cru-ansR-conf
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
31ceee7cc81a059f2e006d11948f4aec - Run description: Crucible@rag25
Original run tag: strict-filtered-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Just check citation support, rely on extraction confidence.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported according to argue_eval. Using abstractive summarization Only retain sentences that have extraction confidence value >= 0.5, are not already selected (according to stopped and stemmed match), do not contain the expression 'source document' For each nugget, among remaining sentence candidates, select the sentence with highest extraction confidence. Chop to 400 words. Created on 2025-08-18
duth.hybrid.qwen.cal¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: duth.hybrid.qwen.cal
- Participant: DUTH
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-31
- Task: trec2025-rag-qrels
- MD5:
8de7ba7990e16104857a971abb81aebd - Run description: Automatic RJ run. Hybrid judge blending Qwen2.5-3B output, Jaccard overlap (narrative↔segment), and normalized baseline (top-20) scores into a confidence; per-topic caps/floors; final calibration (th1=0.40, th2=0.52, th3=0.66, th4=0.78; cap4=2, cap34=5). Focus: strong 3/4 with healthy 2s.
duth.hybrid.qwencon¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: duth.hybrid.qwencon
- Participant: DUTH
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-31
- Task: trec2025-rag-qrels
- MD5:
29dae43350f454b5a24be09308f26798 - Run description: Conservative variant of the Qwen hybrid. Same pipeline; tighter calibration for high labels (th1=0.40, th2=0.54, th3=0.68, th4=0.80; cap4=1, cap34=4; topk4=1, topk3=4, topk2=8). Focus: higher precision for 3/4.
duth.hybrid.stableri¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: duth.hybrid.stableri
- Participant: DUTH
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-31
- Task: trec2025-rag-qrels
- MD5:
a7a41b474895e132fa7fef102f9f2acc - Run description: Automatic RJ run with StableLM-2-1.6B. Same hybrid confidence (LLM + Jaccard + baseline). Calibrated to emphasize recall at label=2 (floor-2=4; th1=0.30, th2=0.38, th3=0.56, th4=0.70; cap4=2, cap34=6). Focus: many trustworthy 2s plus some 3/4.
duth_stablelm2_rj_v1¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: duth_stablelm2_rj_v1
- Participant: DUTH
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-28
- Task: trec2025-rag-qrels
- MD5:
f4943c8bbd35ea9949faee730ca46dee - Run description: Automatic run for the Relevance Judgment subtask. We use an open-weight LLM (stabilityai/stablelm-2-1_6b-chat) as an automatic assessor at the segment level. The prompt encodes the TREC rubric (0–4); decoding is deterministic (do_sample=False, max_new_tokens=16). The model outputs "LABEL, CONFIDENCE" which we parse to produce lines: qid Q0 docid label confidence run_id. We submit exactly top-k segments
e5_monot5_searchR1¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: e5_monot5_searchR1
- Participant: uogTr
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-20
- Task: trec2025-rag-generation
- MD5:
aa6d999a1c7946fde4c79b3e902dd475 - Run description: We input the queries into an Agentic RAG model (Search-R1), which can interact with the retrieval pipeline automatically within its reasoning process until it reaches the final answer.
ensemble_umbrela1¶
Participants | Input | qrel_eval | Appendix
- Run ID: ensemble_umbrela1
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
69b23d7f565ba2e1098d5f1b3b6b7c44 - Run description: Majority voting over various qrels using UMBRELA variant
extractive_rag¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: extractive_rag
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
e517476a24ee9c1cf17422a7029e37fc - Run description: This approach generates 10 queries and includes the top 3 documents from each search. Facts are extracted from documents and grouped to reduce duplicate information. The answer is intended to be independent facts as in a bulleted-list. Splade V3 is used for search. gpt-oss-20b is used to identify facts and group them.
full¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: full
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
32a6fdc8a0949ff567c09567fb0feece - Run description: Splade -> query decomposition (gemma-3-27b-it) -> Qwen3-Reranker-8B -> reciprocal rank fusion -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)
full-ret¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: full-ret
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
c51eb9736ee80570e960a1020bff78be - Run description: Splade -> query decomposition (gemma-3-27b-it) -> Qwen3-Reranker-8B -> reciprocal rank fusion
garag¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: garag
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
718c51860cb68ea924f583bb55ab3461 - Run description: Generate first
gemini_2_5_pro¶
Participants | Input | qrel_eval | Appendix
- Run ID: gemini_2_5_pro
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
9bb9a4269a872b8b4d08d389d4a756ec - Run description: Uses UMBRELA with Gemini 2.5 Pro
genSubQ_merge¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: genSubQ_merge
- Participant: uogTr
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-20
- Task: trec2025-rag-generation
- MD5:
623ac659acd6b1dcbec59b25af6a2fda - Run description: Initially divides each question into subquestions using GPT4.0mini, then answers are generated for each sub-query using Llama3 based on the top 3 retrieved documents. LLama3 is then used to merge the sub-answers into the final response.
gpt-oss-120b-high¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-high
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
82d3098bd02ab052b530a60662bbc0a4 - Run description: Uses UMBRELA variant with gpt-oss-12b high reasoning
gpt-oss-120b-low¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-low
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
e0eb0164749fd82a1ef9c71357081265 - Run description: Uses UMBRELA variant with gpt-oss-120b low reasoning
gpt-oss-120b-med¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-med
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
869accf253d694f14fabd18247f78482 - Run description: Uses UMBERLA variant with gpt-oss-120b medium reasoning
gpt-oss-120b-sn-high¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-sn-high
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
12c384d54bf24a5ea4c043dc49d39b28 - Run description: Creates sub-narratives from the narratives (gpt-oss-120b high reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b high reasoning)
gpt-oss-120b-sn-low¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-sn-low
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
735e8eb0afc672b4a14ea2d120f02556 - Run description: Creates sub-narratives from the narratives (gpt-oss-120b low reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b low reasoning)
gpt-oss-120b-sn-med¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-120b-sn-med
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
b5a6450321c3520de09389dc594db403 - Run description: Creates sub-narratives from the narratives (gpt-oss-120b medium reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-120b medium reasoning)
gpt-oss-20b-high¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-high
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
f0ca50b8eb77c57537e76f096dd28909 - Run description: uses UMBRELA variant with gpt-oss-20B with high reasoning
gpt-oss-20b-low¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-low
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
1c8d1f48b785bc5f34dde18fdf709cf9 - Run description: uses UMBRELA variant with gpt-oss-20B with low reasoning
gpt-oss-20b-medium¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-medium
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
8633f0e61e6e315bf774a62997decbc6 - Run description: uses UMBRELA variant with gpt-oss-20B with medium reasoning
gpt-oss-20b-sn-high¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-sn-high
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
b4f154a286cc9b087a1e305d5342c0b0 - Run description: Creates sub-narratives from the narratives (gpt-oss-20b high reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b high reasoning)
gpt-oss-20b-sn-low¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-sn-low
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
c877daa891d265cbee9be5e2c9579045 - Run description: Creates sub-narratives from the narratives (gpt-oss-20b low reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b low reasoning)
gpt-oss-20b-sn-med¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt-oss-20b-sn-med
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
3e616216958e163af9558e954801026b - Run description: Creates sub-narratives from the narratives (gpt-oss-20b medium reasoning) -> Uses the list of sub-narratives to allocate relevance labels (gpt-oss-20b medium reasoning)
gpt41¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: gpt41
- Participant: UTokyo
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-auggen
- MD5:
796507842a74452c0f52576da5c46ef4 - Run description: Answer generation using GPT-4.1 with the Ragnarök framework's baseline prompt configuration. Processes retrieved passages to generate comprehensive answers, leveraging GPT-4.1's advanced reasoning capabilities without additional prompt engineering or retrieval-aware modifications. Uses standard Ragnarök format for passage presentation and answer structuring.Answer generation using GPT-4.1 with the Ragnarök framework's baseline prompt configuration. Processes retrieved passages to generate comprehensive answers, leveraging GPT-4.1's advanced reasoning capabilities without additional prompt engineering or retrieval-aware modifications. Uses standard Ragnarök format for passage presentation and answer structuring.
gpt_4-1¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt_4-1
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
23f1c42688af195e1b7d513d452b57fd - Run description: Using UMBRELA variant with GPT-4.1
gpt_4-1-sub-narr¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt_4-1-sub-narr
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
04fd2d4f67bbe4b7cf0fec4676fb45fc - Run description: Creates sub-narratives from the narratives (GPT4.1) -> Uses the list of sub-narratives to allocate relevance labels (GPT4.1)
gpt_4_1-sub-narr-2¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt_4_1-sub-narr-2
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
b02fdccd1aeb1680719d77872e7db146 - Run description: Creates sub-narratives from the narratives (GPT4.1-nano) -> Re-writes them (GPT4.1) -> Uses the list of sub-narratives to allocate relevance labels (GPT4.1)
gpt_5-sub-narr¶
Participants | Input | qrel_eval | Appendix
- Run ID: gpt_5-sub-narr
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
ad17637781529c0b03334a5c92d6044d - Run description: Creates sub-narratives from the narratives (GPT5) -> Uses the list of sub-narratives to allocate relevance labels (GPT5)
gptr.nt_q4d4¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: gptr.nt_q4d4
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
487bc7b03b4399a019d452c44dc2c515 - Run description: This run leverages the gpt-researcher framework. It uses notetaking to identify the most informative parts of the document and then generates the answer based on the notes. The system generate 3 queries and uses the initial title to retrieve source documents. The top four documents from each retrieval via splade v3 is used as source material for the generation.
gptr_e2_q3d3¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: gptr_e2_q3d3
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
1d68eadd302bed7ce312cf541e6250a9 - Run description: This run leverages the gpt-researcher framework. It uses a filtering approach to rank snippets in a document as most useful to the query. The top 4 snippets are selected and answers are generated based on the snippets. The system generate 2 queries and uses the initial title to retrieve source documents. The top three documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.
gptr_e2_q4d4¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: gptr_e2_q4d4
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
d4701778f9b7f77c87164bfa80579380 - Run description: This run leverages the gpt-researcher framework. It uses a filtering approach to rank snippets in a document as most useful to the query. The top 4 snippets are selected and answers are generated based on the snippets. The system generate 3 queries and uses the initial title to retrieve source documents. The top four documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.
gptr_nt_q3d3¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: gptr_nt_q3d3
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
b04de1e5e182096fb134e3897e7006ca - Run description: This run leverages the gpt-researcher framework. It uses notetaking to identify the most informative parts of the document and then generates the answer based on the notes. The system generate 2 queries and uses the initial title to retrieve source documents. The top three documents from each retrieval via splade v3 is used as source material for the generation. All llm calls are serviced with llama-3.3-70B-instruct.
grilllab-agent-gpt45¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: grilllab-agent-gpt45
- Participant: grilllab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
c2f20473f6997226a808b4da4a5fe01c - Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.
grilllab-agentic-gpt4¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: grilllab-agentic-gpt4
- Participant: grilllab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-14
- Task: trec2025-rag-retrieval
- MD5:
bd226465eda58793e088380a5261cdcd - Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.
grilllab-agentic-gpt4-generation¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: grilllab-agentic-gpt4-generation
- Participant: grilllab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-14
- Task: trec2025-rag-generation
- MD5:
b7ea1a5b8b8debb98f22e4b281c4b075 - Run description: Uses GPT4.1 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.
grilllab-gpt45-gen¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: grilllab-gpt45-gen
- Participant: grilllab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
91bc14a2e136701c27f2bb22330cc40b - Run description: Uses GPT4.1 & 5 to decompose query into sub questions, which are individually ran through a BM25+Doc2Query+DocumentExpansion pipeline. The results of all queries are combined using RRF to form a single ranked list.
hltcoe-fsrrf¶
Participants | Input | trec_eval | Appendix
- Run ID: hltcoe-fsrrf
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
6ab7f8d2ac42744623005f87473affae - Run description: RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3
hltcoe-gpt5.searcher¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: hltcoe-gpt5.searcher
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
84a93c7dea89f337d92cd7b154c423ca - Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for most steps. GPT-5 is used for final answer generation (drafting) and answer shortening (revising report). LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.
hltcoe-jina¶
Participants | Input | trec_eval | Appendix
- Run ID: hltcoe-jina
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
f1abb78342f2d7ee00fde5a34b68cd0a - Run description: jinaai/jina-reranker-m0 (2.4B) reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3
hltcoe-lg.fsrrf¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: hltcoe-lg.fsrrf
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
43c6509d6d6d39f49b023fcc26660a6a - Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.
hltcoe-lg.jina¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: hltcoe-lg.jina
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
eb2b5dee3977cf8cd41dadf33b63d091 - Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from jinaai/jina-reranker-m0 (2.4B) reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.
hltcoe-lg.qwen¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: hltcoe-lg.qwen
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
a3e7f48c341214060b0a2e3fd7565a97 - Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Qwen/Qwen3-Reranker-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.
hltcoe-lg.searcher¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: hltcoe-lg.searcher
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
68d180accca721e7bc675e3ec716ac1b - Run description: LangGraph generator (reflection, note taking, query generation, etc) with retrieval results from Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3. LangGraph uses Llama 3.3 70B for all steps. LangGraph generates 4 initial queries, retrieves 12 results per query, and runs up to 5 research loops.
hltcoe-qwen¶
Participants | Input | trec_eval | Appendix
- Run ID: hltcoe-qwen
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
75d56806985f7c668115a8a59b28ca83 - Run description: Qwen/Qwen3-Reranker-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3
hltcoe-qwen-jina¶
Participants | Input | trec_eval | Appendix
- Run ID: hltcoe-qwen-jina
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
a00f7c9512dcd3cf1b83d0b5b5918ab3 - Run description: Fusion of jina-reranker-m0 and Qwen-Embedding-8B reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3
hltcoe-searcher¶
Participants | Input | trec_eval | Appendix
- Run ID: hltcoe-searcher
- Participant: hltcoe-rerank
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
86fb809534dc3cbe474deaac38635bd7 - Run description: Searcher II pointwise reranking RRF of PLAID-X, Qwen-Embedding-8B, and SPLADE-v3
hybrid-rerank¶
Participants | Input | trec_eval | Appendix
- Run ID: hybrid-rerank
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
918bc936c263c02618f317eacd1282a4 - Run description: hybrid (bm25, embedding) + rerank
hybrid.stable.loose2¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: hybrid.stable.loose2
- Participant: DUTH
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-31
- Task: trec2025-rag-qrels
- MD5:
ea086f09a8936fd2ed05e975f6401ddd - Run description: Looser StableLM hybrid. Same blended confidence; calibration tuned for broader non-zero coverage (th1=0.30, th2=0.40, th3=0.56, th4=0.70; cap4=2, cap34=7; floor-2=3). Focus: high pool coverage with balanced 2/3/4.
IDACCS-hybrid-gpt4-1¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: IDACCS-hybrid-gpt4-1
- Participant: IDACCS
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
41eeff2b5d22ae79a4c42042f8cdb3b3 - Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used gpt-4.1 to paraphrase Use t5-base to attribute each sentence
IDACCS-hybrid-gpt4o¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: IDACCS-hybrid-gpt4o
- Participant: IDACCS
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
53e37227d588946b9b2c481b598788c2 - Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used GPT-4o to paraphrase Use t5-base to attribute each sentence
IDACCS-nugg-gpt-4-1¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: IDACCS-nugg-gpt-4-1
- Participant: IDACCS
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
7b5b6e2c9a51feb6505305cadc863ae8 - Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used gpt-4.1 to generate a nugget from the extracted sentence Use t5-base to attribute each sentence
IDACCS-nugg-gpt-4o¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: IDACCS-nugg-gpt-4o
- Participant: IDACCS
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
639808c64fc9a2d0e9f02f48d769fa24 - Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Use occams extractive summarizer to generate 800 words Used GPT-4.0 to paraphrase Use t5-base to attribute each sentence
IDACCSabstrct-gpt4-1¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: IDACCSabstrct-gpt4-1
- Participant: IDACCS
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
1aa0f9d01de1b02dbdfac530910f8414 - Run description: Reranked the top 100 provided by organizers via mixedbread-ai/mxbai-rerank-large-v1 and using the query. Used GPT-4.1 to write an abstractive summary Use t5-base to attribute each sentence
jcru-ablR¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: jcru-ablR
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-qrels
- MD5:
a0aa4a4ee8a47d2ef3b6baba6cc90d93 - Run description: Crucible@rag25
Original run tag: ffiltered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. Filtering with argue_eval.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerabilityExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. The frequency with which a cited document is used for sentences is used as relevance score.
jcru-ablR-all¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: jcru-ablR-all
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-qrels
- MD5:
b49ea2817a74e877a0a9cc0c5c8e2e68 - Run description: Crucible@rag25
Original run tag: crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerabilityExtractorRequest Answerability prompt. No filtering with argue_eval.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerabilityExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct The frequency with which a cited document is used for sentences is used as relevance score.
jcru-ansR¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: jcru-ansR
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-qrels
- MD5:
385dff14abf1b03d2d804902b0be7543 - Run description: Crucible@rag25
Original run tag: filtered-covered-covextr-crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. Filtering with argue_eval.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct Sentences retained when citations supported, at least one nugget covers the summary sentence, at least one nugget covers extractive document segment according to argue_eval. The frequency with which a cited document is used for sentences is used as relevance score.
jcru-ansR-all¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: jcru-ansR-all
- Participant: HLTCOE
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-qrels
- MD5:
cc310d2aa1f38bd93d33a25b71c447fb - Run description: Crucible@rag25
Original run tag: crucible-retrieved_docs-rag25_qwen3_merged_questions-retrieved-qwen3_32b.retrieved_docs.jsonl-SupportedAnswerExtractorRequest Question-answering prompt. No filtering with argue_eval.
Crucible report generation. Guiding nuggets: most_common Document source: nugget citations. Nugget extraction prompt 'SupportedAnswerExtractorAll' on collection "ragtime-mt" LLM: llama3.3-70b-instruct The frequency with which a cited document is used for sentences is used as relevance score.
KG-AG-1¶
Participants | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: KG-AG-1
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
443aa290bb605246088ca9aae7a8448c - Run description: Retrieval:
- Use RFF to the result of bm25 (k=10000) and dense embedding (k=10000) to get the hybrid result (k=1000).
- Use cross-encoder to rerank the retrieval result to k=300.
Generation: - Construct KG. - Run triple based ToG to get the reasoning paths on KG, and used the collected results to generate ver.1 answer. - Generate ver.2 answer by using the selected segments from the above reasoning paths to refine the answer.
Kun-Third¶
Participants | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: Kun-Third
- Participant: RMIT-IR
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
84bd82122b799e8eddbec2fc7bdafca7 - Run description: For this run, we used query decomposition + per query re-rank + context words limit 30,000. The reranking uses Falcon-10b. The decomposation and generation use Cluade Sonnet 4.
LAS-agentic-RAG-agent¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: LAS-agentic-RAG-agent
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-15
- Task: trec2025-rag-generation
- MD5:
0b2db2477846a385c03173f13adf4f1d - Run description: This pipeline uses a single RAG agent that is instructed to decompose the question into separate queries, retrieve relevant information from a msmarco segment index, and create a report answering the question. An additional process is used to create citations using similarity matching between sentences in the answer and sentences in the retrieved context.
LAS-agentic-RAG-selector¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: LAS-agentic-RAG-selector
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-15
- Task: trec2025-rag-generation
- MD5:
fd102c64048ec379245e3ce859a591a1 - Run description: This pipeline uses multiple LLM-enabled agents including a selector, a planner, a research assistant, a writer, and a reviewer. The selector chooses the next agent, always choosing the planner first. The planner creates a research plan and a report plan. The research assistant uses a retrieval tool to query the msmarco segment index and compose a list of relevant information. The research assistant can perform multiple queries simultaneously and decides itself how to phrase a query. The writer takes the list of relevant information and the instructions from the planner and creates a draft report. The reviewer analyzes the report to ensure consistency with the source information and that it fully answers the users questions. It is instructed to ask for changes/updates after the first draft, after which the process repeats until the reviewer and the planner have decided the submission is complete.
LAS_con-que¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: LAS_con-que
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
752b2fb3aa3e52203782aa0d689e6707 - Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query.
LAS_con-que-con-nug¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: LAS_con-que-con-nug
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
1259f67ba4dad08d535d99bfbe493220 - Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query. Use castorini/nuggetizer to generate nuggets based on the top 20 results. Nugget creation prompt was modified to generate self-contained 10-20 words nuggets. Concatenate "vital" nuggets into a big query. Search SPLADE-v3 segment index using the big query.
LAS_con-que-sep-nug¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: LAS_con-que-sep-nug
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
5cb876f4b831788bffc023a1f264a012 - Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Concatenate these questions into a big query. Search SPLADE-v3 segment index using the big query. Use castorini/nuggetizer to generate nuggets based on the top 20 results. Nugget creation prompt was modified to generate self-contained 10-20 words nuggets. Search each "vital" nuggets in SPLADE-v3 segment index. Merge result list using RRF, k = 10.
LAS_sep-que¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: LAS_sep-que
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
377ad3d6aa8020894ca3c8fce93cb499 - Run description: Use gpt-4o to rephrase the original narrative into a set of questions. Search each of these questions in SPLADE-v3 segment index. Merge result list using RRF, k = 10.
lg_nt_q4d12l3¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: lg_nt_q4d12l3
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
345e5ea0a643b8745c251059d248f90c - Run description: This run leverages the langraph framework. In a round, the approach produces a set of 4 queries. Notetaking is done on the top 12 documents for each query. The notes from documents retrieved with a single query are used to generate a partial answer. Partial answers from all queries are examined for completeness. If the answer is deemed to be incomplete, up to 4 new queries are proposed to fill knowledge gaps. At the completion of at most 3 rounds, and answer is drafted and then shortened to fit the length limit. All documents are retrieved using splade v3. All llm calls are serviced with llama-3.3-70B-instruct.
lg_nt_q4d12l3_c¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: lg_nt_q4d12l3_c
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
c23af45912900b5335f21bd8e5cfe91a - Run description: This run leverages the langraph framework. In a round, the approach produces a set of 4 queries. Notetaking is done on the top 12 documents for each query. The notes from documents retrieved with a single query are used to generate a partial answer. Partial answers from all queries are examined for completeness. If the answer is deemed to be incomplete, up to 4 new queries are proposed to fill knowledge gaps. At the completion of at most 3 rounds, and answer is drafted and then shortened to fit the length limit. Then each citation is checked to see that it supports the sentence. Unfaithful citations are removed. If a substitute can be found, another document is used instead. Otherwise, the sentence is removed. All documents are retrieved using splade v3. All llm calls are serviced with llama-3.3-70B-instruct.
lucerank¶
Participants | Input | trec_eval | Appendix
- Run ID: lucerank
- Participant: digsci
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
9ce1dee3b819b81f8cf7e00ffec0f84c - Run description: Lucerank is a reranking strategy that leverages highly parallelized LLM calls (gpt‑4.1‑mini in this case) on small random subsets of candidates, and then aggregates them via Luce Spectral Ranking to produce calibrated relevance scores.
NITA-Qrels¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: NITA-Qrels
- Participant: NIT Agartala
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-10
- Task: trec2025-rag-qrels
- MD5:
7eaafe85a5925da0e2716d0ee9544b9e - Run description: This run uses a multi-stage retrieval pipeline. Candidate documents are first taken from a baseline dense retrieval run (top-100), then restricted to the top-20 per query. These candidates are reranked using the BAAI/bge-reranker-large cross-encoder model, which outputs confidence scores mapped into 5-level relevance judgments (0–4). Final results are written in TREC QREL TSV format with per-query top-20 judgments.
NITA_AG_JH¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: NITA_AG_JH
- Participant: NITATREC
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
8cd67655c083cda87bdc498537aa11aa - Run description: This run employed Falcon-7B-Instruct (tiiuae/falcon-7b-instruct) for answer generation. For each query, the top 20 passages retrieved by the baseline system were provided to the model, and outputs were generated using deterministic decoding (do_sample=False, temperature=0.0, max_new_tokens=400) to ensure factual precision and reduce hallucination. The generated responses were post-processed through sentence segmentation, citation extraction, and formatting into TREC Format-2 JSON with metadata, references, and citation-linked answers.
NITA_R_DPR¶
Participants | Input | trec_eval | Appendix
- Run ID: NITA_R_DPR
- Participant: NITATREC
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
ccc04756c4dd5653238a81a52c3d6b97 - Run description: This submission implements a three-stage hybrid retrieval pipeline for TREC RAG 2025. Stage 1 utilizes pre-computed BM25 results to select the top 500 lexical candidates per query. Stage 2 applies DPR semantic filtering using Facebook's dpr-question_encoder-single-nq-base and dpr-ctx_encoder-single-nq-base models to reduce candidates to the top 200 based on cosine similarity of dense embeddings. Stage 3 performs neural reranking with cross-encoder/ms-marco-MiniLM-L-12-v2 for final relevance scoring. The system processes 105 queries with GPU batch inference (batch size 256), delivering top-100 results that combine lexical matching with semantic understanding through transformer-based architectures.
NITA_R_JH_HY¶
Participants | Input | trec_eval | Appendix
- Run ID: NITA_R_JH_HY
- Participant: NITATREC
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
06f5a6f550fe876ed20740b6bf5889e6 - Run description: The system implements a hybrid retrieval and reranking pipeline for TREC 2025 RAG. In the first stage, candidate documents are retrieved using both sparse and dense retrieval methods: BM25 (via Pyserini) retrieves the top 1000 documents per query, while Dense Passage Retrieval (DPR) leverages facebook/dpr-question_encoder-single-nq-base for queries and facebook/dpr-ctx_encoder-single-nq-base for documents to retrieve the top 500 candidates. The resulting sets are fused, ensuring unique documents across both retrievers. In the second stage, a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-12-v2) scores each query-document pair to produce fine-grained relevance rankings. Documents are then sorted by cross-encoder scores, with the top 100 per query output in the TREC run file format. The pipeline utilizes GPU acceleration for DPR embedding generation and cross-encoder inference, optimizing retrieval efficiency and enabling scalable reranking across large candidate sets.
no-decomp¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: no-decomp
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
988c991e74720be034c70161240e6b24 - Run description: Splade -> Qwen3-Reranker-8B -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)
no-decomp-reranker¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: no-decomp-reranker
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
8b5f85ae73937f608f3d5a9db488a07d - Run description: Splade -> set-wise passage selection (gemma-3-27b-it) -> generation (gpt-5)
We consider set-wise passage selection to be a part of the generation step.
no-llm¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: no-llm
- Participant: WING-II
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
ab582af222cb06e728a58ae2ff276730 - Run description: greedy submodular segment selection → rule-based evidence-card compression → heuristic claim assembly → lexical-IDF self-check & citation repair → hard post-fix (length/indices)
no-llm-refined¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: no-llm-refined
- Participant: WING-II
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
55f7d2b3e3c4257a8f165aa8b879b129 - Run description: greedy submodular segment selection → rule-based evidence-card compression → heuristic claim assembly → lexical-IDF self-check & citation repair → hard post-fix (length/indices) → LLM refinement
no-reranker¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: no-reranker
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
a9926f8d0b52e892f34e5d9ecb91e24a - Run description: Splade -> query decomposition (gemma-3-27b-it) -> reciprocal rank fusion -> setwise retrieval (gemma-3-27b-it) -> generation (gpt-5)
norm_nugget_cnt¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: norm_nugget_cnt
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-25
- Task: trec2025-rag-qrels
- MD5:
2aa20d80af72bd51bee89b85028bebce - Run description: Length normalized count of nuggets
nugget-generation¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: nugget-generation
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
046c0f55a01ab9c78d03cc4ab0763db7 - Run description: Nugget generation from top 20 passages, and then response generation from these nuggets. Using gpt4o.
nugget_cnt¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: nugget_cnt
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-25
- Task: trec2025-rag-qrels
- MD5:
bcc93d0687e51c7b9306307a40f79b07 - Run description: Count of nuggets
nuggetizer¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: nuggetizer
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
ec28606d64e1a5bd5c7023ebc9dc160c - Run description: Nuggetizer
ori_query_entities¶
Participants | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: ori_query_entities
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
1d24b88f389e03050d2880b7908c66e6 - Run description: Preprocessing: I first use an LLM to extract entities from each sentence in the segments. Then, for every entity, I use an embedding model to retrieve the top 10 most relevant sentences from segments as its description.
Retrieval Task: For each query, I compute the relevance between the query and every entity description using an embedding model. If the same segment ID appears under multiple entities, I keep the highest score among them as the score for that segment ID. This score is used to rank the segments for retrieval.
AG Task: For each query, I take the retrieved entities and send their descriptions to an LLM to generate an answer. The answer is split into sentences. For each sentence, I again use the embedding model to retrieve the most relevant entity, and the segment IDs linked to that entity are used as the citation for that sentence.
Qwen3-30B-Instruct¶
Participants | Input | qrel_eval | Appendix
- Run ID: Qwen3-30B-Instruct
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
ddcd80c89ed42a9ac2d1e099ecde312d - Run description: Using UMBRELA variant with Qwen3-30B-A3B-Instruct-2507
Qwen3-30B-Think¶
Participants | Input | qrel_eval | Appendix
- Run ID: Qwen3-30B-Think
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
93ce8f1b76b99b2a81e11d730f7bd979 - Run description: Using UMBRELA variant with Qwen3-30B-A3B-Thinking-2507
Qwen3-30BInstruct-sn¶
Participants | Input | qrel_eval | Appendix
- Run ID: Qwen3-30BInstruct-sn
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
d47142d1d1ce84bbd679c65a8f465e32 - Run description: Creates sub-narratives from the narratives (Qwen3-30B-A3B-Instruct-2507) -> Uses the list of sub-narratives to allocate relevance labels (Qwen3-30B-A3B-Instruct-2507)
Qwen3-30BThink-sn¶
Participants | Input | qrel_eval | Appendix
- Run ID: Qwen3-30BThink-sn
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-09-11
- Task: trec2025-rag-qrels
- MD5:
cbf3fc5449f7626528c106ba8834d0fb - Run description: Creates sub-narratives from the narratives (Qwen3-30B-A3B-Thinking-2507) -> Uses the list of sub-narratives to allocate relevance labels (Qwen3-30B-A3B-Thinking-2507)
qwen_splade¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: qwen_splade
- Participant: UTokyo
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
704911d252b742856c1980552e4b65f7 - Run description: Hybrid retrieval pipeline leveraging HyDE (Hypothetical Document Embeddings) with Qwen3-Embedding-0.6B for dense retrieval (query:HyDE 0.3:0.7 weighted combination) and SPLADE sparse retrieval. Results are fused using Reciprocal Rank Fusion (RRF, k=60) to combine complementary signals from dense and sparse methods. Final ranking performed by GPT-4.1-mini with sliding window reranking (window=10, stride=5, 3 passes) incorporating document title, URL, and segment content for enhanced contextual understanding.
r_2method_ag_gpt41¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: r_2method_ag_gpt41
- Participant: UTokyo
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
25678ddbacd6bd95f92a45625b8d7961 - Run description: Complete RAG pipeline integrating hybrid retrieval (Qwen3-0.6B HyDE dense + SPLADE sparse with RRF fusion, GPT-4.1-mini reranking) with GPT-4.1 generation. The retrieval stage employs HyDE-enhanced dense search with query:HyDE 0.3:0.7 weighting, complemented by SPLADE's learned sparse representations. After RRF fusion (k=60), GPT-4.1-mini performs listwise reranking using sliding windows (w=10, s=5) with enriched context (title+URL+segment). The generation stage uses GPT-4.1 with Ragnarök baseline prompting to synthesize coherent answers from top-ranked passages.
r_4method_ag_gpt41¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: r_4method_ag_gpt41
- Participant: UTokyo
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
ead3a28a44e2cdbfafeb9eab605f864d - Run description: Full-scale RAG system integrating a sophisticated 4-method retrieval ensemble with GPT-4.1 generation. Retrieval combines dual dense approaches (Qwen3-0.6B and BGE-small-en-v1.5, both using HyDE with GPT-4.1-generated hypothetical answers, query:HyDE 0.3:0.7 weighted) and dual sparse methods (SPLADE's contextualized sparse representations and BM25 enhanced with GPT-4.1 keyword expansion producing 10 relevant terms per query). The 4000 total candidates (1000 per method) undergo RRF fusion (k=60) to create a unified ranking, followed by GPT-4.1-mini listwise reranking with sliding windows (w=10, s=5, 3 passes) using enriched context. Generation employs GPT-4.1 with Ragnarök baseline prompting, processing the top-reranked passages to produce comprehensive, grounded answers.
rag-v1-gpt¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag-v1-gpt
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
fb1fd9638bd669d708ae40d47ae975c5 - Run description: The workflow combines retrieval, reranking, sub-query answer generation, and final article integration. It first acquires the top 100 documents via a multi-stage retrieval pipeline. For each main query, decomposed sub-queries are retrieved for the top-10 segments, concise factual answers are generated, and citation support is analyzed with the LLM (gpt-4.1-mini). All sub-answers are ordered and refined by the LLM, and output as a structured, multi-paragraph answer in JSON format with citations, ensuring clear logic, traceable sources, precision, and structure.
rag-v2-gpt¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag-v2-gpt
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
de4b196db4f37dc19e8b01388a47e56d - Run description: This workflow integrates retrieval, reranking, sub-query answer generation, and final article assembly. It first obtains the top 100 documents via a multi-stage retrieval pipeline. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, generates concise sub-query answers, and finally the LLM (gpt-4.1-mini) integrates them into a complete article with citation markers, splitting the output into sentence-level answers. The final output is in JSON format, including query information, total word count, citation list, and citation-annotated answers, fully presenting the retrieval-to-answer process.
rag-v2-llama¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag-v2-llama
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
cacb1ed600648c827e02138ae675c188 - Run description: This workflow combines retrieval, reranking, sub-query answer generation, and final article assembly. First, it acquires the top 100 documents through a multi-stage retrieval pipeline. For each query ID, the cross-encoder retrieves the top-10 documents per sub-query, concise sub-query answers are generated, and the LLM (Llama 3.1 8b) merges them into a complete article with citation markers, splitting the output into sentence-level answers. The final result is output in JSON format, including query information, total word count, citation list, and citation-annotated answers—fully presenting the workflow from retrieval to answer generation.
rag25_qwen3_20_ag¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag25_qwen3_20_ag
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
f96387a177e0f89646c80d591ce7be72 - Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-20): Qwen3
rag25_qwen3_50_ag¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag25_qwen3_50_ag
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-auggen
- MD5:
6bfa165adf69fbd066db545dc975eaf7 - Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-50): Qwen3
rag25_test_arctic-l¶
Participants | Input | trec_eval | Appendix
- Run ID: rag25_test_arctic-l
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
5c1da23ebd3a67b2af93c77ec3ba27b7 - Run description: Uses Snowflake's Arctic embed l
rag25_test_arctic-m¶
Participants | Input | trec_eval | Appendix
- Run ID: rag25_test_arctic-m
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
2bf6ca5361029893ea836bad041fb09c - Run description: Uses Snowflake's Arctic embed m
rag25_test_qwen3_20¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag25_test_qwen3_20
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
f96387a177e0f89646c80d591ce7be72 - Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-20): Qwen3
rag25_test_qwen3_50¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag25_test_qwen3_50
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
6bfa165adf69fbd066db545dc975eaf7 - Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen Generation (top-50): Qwen3
rag25_test_rankqwen3¶
Participants | Input | trec_eval | Appendix
- Run ID: rag25_test_rankqwen3
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
9f33d21b14911991ff8a801f40b4d5fb - Run description: Stage 1 (top-1K): RRF(Splade v3, Snowflake's Arctic embed l) Stage 2 (top-100): RankQwen
rag25_test_splade-v3¶
Participants | Input | trec_eval | Appendix
- Run ID: rag25_test_splade-v3
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
76452dc5a3457d0b62a157faffeb9dae - Run description: Anserini Splade v3
rag_v4¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: rag_v4
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
b40ca1291d8f3ad5f2728462ae815d9d - Run description: We submit a retrieval-augmented generation run using precomputed retrieval (TREC runfile) over the MSMARCO v2.1 segmented corpus. For each decomposed subquery, we feed the top-k passages to OpenAI gpt-4.1-mini with strict, evidence-only prompts that return JSON {answer, citations} with exact # passage IDs; we then do a short coherence rewrite (fixed-length array) and reattach filtered citations. No open-weight models are used; low temperature and light post-processing (dedupe, min/max citations) ensure stable, submission-ready outputs.
Rerank-Top50_v2¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: Rerank-Top50_v2
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
3c269f2ce9ca4bb9319349e3577234c3 - Run description: use LLM to rerank Top 50 segment and generate answer based on it
Rerank-Top50_v3¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: Rerank-Top50_v3
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
a4b41843a612e26a9aed8a4e0a0388be - Run description: v3
ret-gemma¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ret-gemma
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
cec1aeca5ca65c7c527771d915a46e12 - Run description: Splade -> query decomposition (gemma-3-27b-it) -> reranker (gemma-3-27b-it) -> reciprocal rank fusion
ret-no-decomp¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ret-no-decomp
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
f67ae4ceef9ffb7d1fdae49a5707e5f7 - Run description: Splade -> Qwen3-Reranker-8B
ret-no-reranker¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ret-no-reranker
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
638cf3ec7378bfdec87aedff7c107736 - Run description: Splade -> query decomposition (gemma-3-27b-it) -> reciprocal rank fusion
ret-splade-only¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ret-splade-only
- Participant: MITLL
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
ee266323e5527bc2090d5c26014d60a9 - Run description: Splade only.
ronly_auto_plan¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ronly_auto_plan
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
392dcada37cae6ae1b5e8d35f3a196f4 - Run description: Automatically planning
ronly_auto_selected¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ronly_auto_selected
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
b534614227e0581b3014aa80ea33fb8c - Run description: Backward from answer to retrieval
ronly_combined¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ronly_combined
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
63b3e7fdd6a4a31b130ab333a876afed - Run description: Combining retrieved passages from multiple techniques + reranking
ronly_garag¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ronly_garag
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
72fbbaa42d64459ecafb1de50f90fb34 - Run description: Generate first
ronly_nuggetizer¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: ronly_nuggetizer
- Participant: WaterlooClarke
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-retrieval
- MD5:
879d76a29c54d70c7e28eb4c6d9251cc - Run description: Mono + Duo
RRF_all¶
Participants | Input | trec_eval | Appendix
- Run ID: RRF_all
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
9daf535a729df579814cbc8708a9d334 - Run description: A hybrid retrieval pipeline combining BM25 with query expansion, dense embedding search, and neural rerankers (ColBERTv2, MiniLM). Results are fused via RRF, with the top-100 outputs submitted.
RRF_colbert_minlm¶
Participants | Input | trec_eval | Appendix
- Run ID: RRF_colbert_minlm
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
760a2c60620cdef5c12ae35ca4427035 - Run description: This run combines ColBERTv2 results with MiniLM cross-encoder reranking using Reciprocal Rank Fusion (RRF). Final ranking selects the top 100 documents per query.
RRF_colert_bm25¶
Participants | Input | trec_eval | Appendix
- Run ID: RRF_colert_bm25
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
77f1d6d3176ad88c774280a2820fb252 - Run description: A multi-stage automatic retrieval run combining BM25 with ColBERT reranking, fused via RRF, and outputting the top-100 results.
RRF_minilm_bm25¶
Participants | Input | trec_eval | Appendix
- Run ID: RRF_minilm_bm25
- Participant: cfdalab
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-retrieval
- MD5:
33fc6e836bc70a27b816a528276b7217 - Run description: A multi-stage retrieval run combining BM25 with MiniLM pointwise reranking, fused via RRF, with top-100 results submitted.
selector-agent-trim¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: selector-agent-trim
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
888b35018bdd101e3834756e2854ed9b - Run description: This run uses a selector group chat. The selector is gpt-4.1-mini. It assigns each message turn choosing between a planner, a research assistant, a writer, and a reviewer. The planner and reviewer use gpt-4.1 and the research assistant and writer use gpt-4.1-mini. The output is trimmed using organizer's script and reformatted to format 1.
sentence-transformers-all-MiniLM-L6-v2¶
Participants | Input | trec_eval | Appendix
- Run ID: sentence-transformers-all-MiniLM-L6-v2
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-15
- Task: trec2025-rag-retrieval
- MD5:
a9e385d6990cb1b68f78a6eb97ac5d87 - Run description: sentence-transformers/all-MiniLM-L6-v2 retrieve for original queries.
single-agent-trim¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: single-agent-trim
- Participant: ncsu-las
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-16
- Task: trec2025-rag-generation
- MD5:
fb58466227615672ecdf86b3a4670ca1 - Run description: This run uses a single agent to decompose the user input into multiple queries, calls the search_splade tool, and writes a report. The output is trimmed using organizer's script and reformatted to format 1.
splade-v3-arctic-l¶
Participants | Input | trec_eval | Appendix
- Run ID: splade-v3-arctic-l
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
385a0ba9bc099487742d0587d8d97517 - Run description: Stage 1: RRF(Splade v3, Snowflake's Arctic embed l)
splade-v3-arctic-m¶
Participants | Input | trec_eval | Appendix
- Run ID: splade-v3-arctic-m
- Participant: coordinators
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
69dccf5941af099abf559bde3638c188 - Run description: Stage 1: RRF(Splade v3, Snowflake's Arctic embed m 1.5)
standard_roll¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: standard_roll
- Participant: IRIT-ISIR-EV
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-20
- Task: trec2025-rag-generation
- MD5:
92f0bf95bf0d003d05205490f7f81110 - Run description: This run uses the full document as input instead of the segment
strd_roll_segment¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: strd_roll_segment
- Participant: IRIT-ISIR-EV
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
3dbb7b4b8894cfb045327708531795b1 - Run description: This run leverages segments from input documents instead of the full document
sub_query_entities¶
Participants | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: sub_query_entities
- Participant: clip2025
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
eb8344cfce7dc3e3545bb3471cad05ef - Run description: Preprocessing: I first use an LLM to extract entities from each sentence in the segments. Then, for every entity, I use an embedding model to retrieve the top 10 most relevant sentences from segments as its description.
Retrieval Task: For each query, I compute the relevance between the query and those sub query and every entity description using an embedding model. If the same segment ID appears under multiple entities, I keep the highest score among them as the score for that segment ID. This score is used to rank the segments for retrieval.
AG Task: For each query, I take the retrieved entities and send their descriptions to an LLM to generate an answer. The answer is split into sentences. For each sentence, I again use the embedding model to retrieve the most relevant entity, and the segment IDs linked to that entity are used as the citation for that sentence.
swarm¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: swarm
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
f86a5f6c46598e836d11f82928eb2b71 - Run description: This run leverages the AutoGen framework. Multiple agents work on the answer generation: a query decomposition agent, document retrieval agent, report writing agent, report editing agent, and a report publishing agent. The query decomposer creates 3 queries at a time. Retrieval is done with splade V3 and the top 6 documents are included. The document retrival agent uses a fact extractor agent to identify key parts of documents. The report writing agent removes redundant facts. The report editing agent compiles the facts into an answer and determines whether more information is needed or the answer is ready to be published. After publishing an independent editor is used to fit the answer into the length limit. All llm calls are serviced with llama-3.3-70B-instruct.
swarm_c¶
Participants | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: swarm_c
- Participant: hltcoe-multiagt
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
071a22280dcc7a4be911174b0d8be951 - Run description: This run leverages the AutoGen framework. Multiple agents work on the answer generation: a query decomposition agent, document retrieval agent, report writing agent, report editing agent, and a report publishing agent. The query decomposer creates 3 queries at a time. Retrieval is done with splade V3 and the top 6 documents are included. The document retrival agent uses a fact extractor agent to identify key parts of documents. The report writing agent removes redundant facts. The report editing agent compiles the facts into an answer and determines whether more information is needed or the answer is ready to be published. After publishing an independent editor is used to ensure citation accuracy and fit the answer into the length limit. All llm calls are serviced with llama-3.3-70B-instruct.
uema2lab_B4¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: uema2lab_B4
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-17
- Task: trec2025-rag-generation
- MD5:
1e3cba12b9251d0b944b7435f5a1470d - Run description: This run decomposes the narrative query into sub-queries. These sub-queries are then used in a hybrid retrieval system, combining dense and sparse methods, to gather relevant documents for the LLM. Subsequently, we leverage the LLM to score each document's relevance against the sub-queries. The documents are then ranked and provided to the LLM in descending order of their scores, and the prompt explicitly instructs the model that the documents are sorted by relevance to inform the final answer generation.
uema2lab_base¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: uema2lab_base
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
8a02790c4007ca840c10ba387954ffd5 - Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt a hybrid document-level retrieval pipeline that combines BM25-based sparse retrieval with dense vector search using OpenAI embeddings. To improve the relevance and diversity of retrieved results, we decomposed original queries into several subqueries. For each sub-queries, we retrieve a few top-K candidate documents. The top-k documents are selected based and used as input to a gemini pro 1.5 LLM to generate answers. We also assigned segementation IDs after the answer found out.
uema2lab_narrative¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: uema2lab_narrative
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
9f2b77fa772b1a7e21927d3b90ca242a - Run description: We applied a hybrid search combining BM25 and dense retrieval (OpenAI text-embedding-3-small, 1536-dim) to retrieve the top-20 documents per narrative. From each document, the segment most similar in embedding space to the narrative text was selected, and the ranked document lists were thus converted into segment-level lists for evaluation. This run serves as a comparison baseline to the subquery-expansion approach. The results are compared against other runs (runid: uema2lab_rrf, uema2lab_rrf_k10, uema2lab_segment).
uema2lab_rag_fewdoc¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: uema2lab_rag_fewdoc
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
6db979087469a2f2aadfdfc53e594e38 - Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt organizer's baseline retrieval to our baseline system to compare with effectiveness of subquery decompositions. Other AG part is the same as our baseline system described in the different run.
uema2lab_rag_org¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | Appendix
- Run ID: uema2lab_rag_org
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-generation
- MD5:
80eeb2f05334fdbd10fe7fb304ffa3d5 - Run description: Our system for the TREC 2025 RAG Track is designed to effectively handle long-context retrieval-augmented generation (RAG) tasks. We adopt organizer's baseline retrieval to our baseline system to compare with effectiveness of subquery decompositions. Other AG part is the same as our baseline system described in the different run.
uema2lab_rrf¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: uema2lab_rrf
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
2ce3ffd6cfd2d2f833d91f5eccb287ec - Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index, and the top 100 documents were retrieved. These ranked lists for each sub-query were then merged using Reciprocal Rank Fusion (RRF) with rrf_K=60 to produce a final ranked list of documents at the narrative-query level. Each document in the ranked list was segmented into smaller textual segments, and each segment was considered a candidate. We computed embedding-based similarity between each segment and the original narrative query, and selected the most relevant segment for each document. This final ranked list of segments was submitted as the run.
uema2lab_rrf_k10¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: uema2lab_rrf_k10
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
3916923664d2611e7a47c1746f954b95 - Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index, and the top 100 documents were retrieved. These ranked lists for each sub-query were then merged using Reciprocal Rank Fusion (RRF) with rrf_K=10 to produce a final ranked list of documents at the narrative-query level. Each document in the ranked list was segmented into smaller textual segments, and each segment was considered a candidate. We computed embedding-based similarity between each segment and the original narrative query, and selected the most relevant segment for each document. This final ranked list of segments was submitted as the run.
uema2lab_segment¶
Participants | Proceedings | Input | trec_eval | Appendix
- Run ID: uema2lab_segment
- Participant: tus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-retrieval
- MD5:
b41156614eed6c7d28458d3c518cc59b - Run description: In this system, each narrative query was first decomposed into multiple sub-queries. For each sub-query, both sparse (BM25) and dense (vector-based) retrieval were performed on a document-level index. The retrieval results were then fused using Reciprocal Rank Fusion (RRF) to produce a ranked list of 100 documents per sub-query. These sub-query-level ranked lists were further merged using RRF at the narrative-query level to produce a final ranked list of documents. Next, each document in the ranked list was segmented into smaller textual segments. For each segment, we computed the embedding-based similarity to the original narrative query, and selected the most relevant segment per document. The final ranked list of segments was submitted as the run.
unique_cluster_cnt¶
Participants | Proceedings | Input | qrel_eval | Appendix
- Run ID: unique_cluster_cnt
- Participant: GenAIus
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-25
- Task: trec2025-rag-qrels
- MD5:
1898d306ee979ef708a9ab1f77c17681 - Run description: Count of unique clusters
wingii-3-rl-refined¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: wingii-3-rl-refined
- Participant: WING-II
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
3d843c728caf14fccc60724f91fb5e54 - Run description: Submodular evidence selection (K=24) + small-model evidence-card compression + citation-first claims with strict JSON output, followed by a lexical self-check/post-fixer (citation strengthening, length control) + Refinement.
wingii-v3-gpt¶
Participants | Proceedings | Input | nist-post-edit | nuggetizer_eval | full_manual | Appendix
- Run ID: wingii-v3-gpt
- Participant: WING-II
- Track: Retrieval Augmented Generation (RAG)
- Year: 2025
- Submission: 2025-08-18
- Task: trec2025-rag-auggen
- MD5:
c3ad8f01fa37643990a9e515966b4e5a - Run description: Submodular evidence selection (K=24) + small-model evidence-card compression + citation-first claims with strict JSON output, followed by a lexical self-check/post-fixer (citation strengthening, length control).