Proceedings 2024¶

Adhoc Video Search¶

WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK¶

Heng Liu (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)Jiangshan He (NERCMS)Zeyuan Zhang (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)Yuanyuan Xu (NERCMS)Chao Liang (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)

Participant: WHU-NERCMS
Paper: https://trec.nist.gov/pubs/trec33/papers/WHU-NERCMS.avs.pdf
Runs: run4 | run3 | run2 | Manual_run1 | relevance_feedback_run4 | relevance_feedback_run1 | auto_run1 | rf_run2 | RF_run3

Abstract

The WHU-NERCMS team participated in the ad-hoc video search (AVS) task of TRECVID 2024. In this year’s AVS task, we continued to use multiple visual semantic embedding methods, combined with interactive feedback-guided ranking aggregation techniques to integrate different models and their outputs to generate the final ranked video shot list. We submitted 4 runs each for automatic and interactive tasks, along with one attempt for a manual assistance task. Table 1 shows our results forthis year.

Bibtex

@inproceedings{WHU-NERCMS-trec2024-papers-proc-1,
    author = {Heng Liu (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)
Jiangshan He (NERCMS)
Zeyuan Zhang (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)
Yuanyuan Xu (NERCMS)
Chao Liang (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)},
    title = {WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {WHU-NERCMS},
    trec_runs = {run4, run3, run2, Manual_run1, relevance_feedback_run4, relevance_feedback_run1, auto_run1, rf_run2, RF_run3},
    trec_tracks = {avs}
   url = {https://trec.nist.gov/pubs/trec33/papers/WHU-NERCMS.avs.pdf}
}

ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024¶

Konstantinos Gkountakos, Damianos Galanopoulos, Antonios Leventakis, Georgios Tsionkis, Klearchos Stavrothanasopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris

Participant: CERTH-ITI
Paper: https://trec.nist.gov/pubs/trec33/papers/CERTH-ITI.avs.actev.pdf
Runs: certh.iti.avs.24.main.run.1 | certh.iti.avs.24.main.run.2 | certh.iti.avs.24.main.run.3 | certh.iti.avs.24.progress.run.1 | certh.iti.avs.24.progress.run.2 | certh.iti.avs.24.progress.run.3

Abstract

This report presents the overview of the runs related to Ad-hoc Video Search (AVS) and Activities in Extended Video (ActEV) tasks on behalf of the ITI-CERTH team. Our participation in the AVS task involves a collection of five cross-modal deep network architectures and numerous pre-trained models, which are used to calculate the similarities between video shots and queries. These calculated similarities serve as input to a trainable neural network that effectively combines them. During the retrieval stage, we also introduce a normalization step that utilizes both the current and previous AVS queries for revising the combined video shot-query similarities. For the ActEV task, we adapt our framework to support a rule-based classification to overcome the challenges of detecting and recognizing activities in a multi-label manner while experimenting with two separate activity classifiers.

Bibtex

@inproceedings{CERTH-ITI-trec2024-papers-proc-1,
    author = {Konstantinos Gkountakos, Damianos Galanopoulos, Antonios Leventakis, Georgios Tsionkis, Klearchos Stavrothanasopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris},
    title = {ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {CERTH-ITI},
    trec_runs = {certh.iti.avs.24.main.run.1, certh.iti.avs.24.main.run.2, certh.iti.avs.24.main.run.3, certh.iti.avs.24.progress.run.1, certh.iti.avs.24.progress.run.2, certh.iti.avs.24.progress.run.3},
    trec_tracks = {avs.actev}
   url = {https://trec.nist.gov/pubs/trec33/papers/CERTH-ITI.avs.actev.pdf}
}

RUC_AIM3 at TRECVID 2024: Ad-hoc Video Search¶

Xueyan Wang (Renmin University of China)Yang Du (Renmin University of China)Yuqi Liu (Renmin University of China)Qin Jin (Renmin University of China)

Participant: ruc_aim3
Paper: https://trec.nist.gov/pubs/trec33/papers/ruc_aim3.avs.pdf
Runs: add_captioning | baseline | add_QArerank | add_captioning_QArerank

Abstract

This report presents our solution for the Ad-hoc Video Search (AVS) task of TRECVID 2024. Based on our baseline AVS model in TRECVID 2023, we further improve the searching performance by integrating multiple visual-embedding models, performing video captioning to be used for topic-to-caption searches, and applying a re-ranking strategy for top candidate search selection. Our submissions from our improved AVS model rank the 3rd in TRECVID AVS 2024 on mean average precision (mAP) in the main task, achieving the best run of 36.8.

Bibtex

@inproceedings{ruc_aim3-trec2024-papers-proc-1,
    author = {Xueyan Wang (Renmin University of China)
Yang Du (Renmin University of China)
Yuqi Liu (Renmin University of China)
Qin Jin (Renmin University of China)},
    title = {RUC_AIM3 at TRECVID 2024: Ad-hoc Video Search},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ruc_aim3},
    trec_runs = {add_captioning, baseline, add_QArerank, add_captioning_QArerank, VTM and VTC for two model, VTC for two model, VTM for two model, VTM and VTC for videollama2 robust, VTM and VTC for two model primary, VTC for two model primary, VTM for two model primary, VTM and VTC for videollama2 primary},
    trec_tracks = {avs}
   url = {https://trec.nist.gov/pubs/trec33/papers/ruc_aim3.avs.pdf}
}

softbank-meisei-trec2024-papers-proc-2¶

Aiswariya Manoj Kumar（Softbank Corp.）Hiroki Takushima（Softbank Corp.）Yuma Suzuki（Softbank Corp.）Hayato Tanoue（Softbank Corp.）Hiroki Nishihara（Softbank Corp.）Yuki Shibata（Softbank Corp.）Haruki Sato（Agoop Corp.）Takumi Takada（SB Intuitions Corp.）Takayuki Hori（Softbank Corp.）Kazuya Ueki（Meisei Univ.）

Participant: softbank-meisei
Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Runs: SoftbankMeisei - Main Run 1 | SoftbankMeisei - Main Run 2 | SoftbankMeisei - Main Run 3 | SoftbankMeisei - Main Run 4 | SoftbankMeisei - Progress Run 1 | SoftbankMeisei - Progress Run 2 | SoftbankMeisei - Progress Run 3 | SoftbankMeisei - Progress Run 4

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    author = {Aiswariya Manoj Kumar（Softbank Corp.）
Hiroki Takushima（Softbank Corp.）
Yuma Suzuki（Softbank Corp.）
Hayato Tanoue（Softbank Corp.）
Hiroki Nishihara（Softbank Corp.）
Yuki Shibata（Softbank Corp.）
Haruki Sato（Agoop Corp.）
Takumi Takada（SB Intuitions Corp.）
Takayuki Hori（Softbank Corp.）
Kazuya Ueki（Meisei Univ.）},
    title = {softbank-meisei-trec2024-papers-proc-2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks¶

Kazuya Ueki (Meisei University)Yuma Suzuki (SoftBank Corp.)Hiroki Takushima (SoftBank Corp.)Haruki Sato (Agoop Corp.)Takumi Takada (SB Intuitions Corp.)Aiswariya Manoj Kumar (SoftBank Corp.)Hayato Tanoue (SoftBank Corp.)Hiroki Nishihara (SoftBank Corp.)Yuki Shibata (SoftBank Corp.)Takayuki Hori (SoftBank Corp.)

Participant: softbank-meisei
Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Runs: SoftbankMeisei - Main Run 1 | SoftbankMeisei - Main Run 2 | SoftbankMeisei - Main Run 3 | SoftbankMeisei - Main Run 4 | SoftbankMeisei - Progress Run 1 | SoftbankMeisei - Progress Run 2 | SoftbankMeisei - Progress Run 3 | SoftbankMeisei - Progress Run 4

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    author = {Kazuya Ueki (Meisei University)
Yuma Suzuki (SoftBank Corp.)
Hiroki Takushima (SoftBank Corp.)
Haruki Sato (Agoop Corp.)
Takumi Takada (SB Intuitions Corp.)
Aiswariya Manoj Kumar (SoftBank Corp.)
Hayato Tanoue (SoftBank Corp.)
Hiroki Nishihara (SoftBank Corp.)
Yuki Shibata (SoftBank Corp.)
Takayuki Hori (SoftBank Corp.)},
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {avs.vtt}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf}
}

AToMiC¶

Biomedical Generative Retrieval (BioGen) Track¶

Exploring the Few-Shot Performance of Low-Cost Proprietary Models in the 2024 TREC BioGen Track¶

Samy Ateia (University of Regensburg)Udo Kruschwitz (University of Regensburg)

Participant: ur-iw
Paper: https://trec.nist.gov/pubs/trec33/papers/ur-iw.biogen.pdf
Runs: zero-shot-gpt4o-mini | zero-shot-gemini-flash | ten-shot-gpt4o-mini | ten-shot-gemini-flash | ten-shot-gpt4o-mini-wiki | ten-shot-gemini-flash-wiki

Abstract

For the 2024 TREC Biomedical Generative Retrieval (BioGen) Track, we evaluated proprietary low-cost large language models (LLMs) in few-shot and zero-shot settings for biomedical question answering. Building upon our prior competitive approach from the CLEF 2024 BioASQ challenge, we adapted our methods to the BioGen task. We reused few-shot examples from BioASQ and generated additional ones from the test set for the BioGen specific answer format, by using an LLM judge to select examples. Our approach involved query expansion, BM25-based retrieval using Elasticsearch, snippet extraction, reranking, and answer generation both with and without 10-shot learning and additional relevant context from Wikipedia. The results are in line with our findings at BioASQ, indicating that additional Wikipedia context did not improve the results, while 10-shot learning did. An interactive reference implementation that showcases Google's Gemini-1.5-flash performance with 3-shot learning is available online and the source code of this demo is available on GitHub.

Bibtex

@inproceedings{ur-iw-trec2024-papers-proc-1,
    author = {Samy Ateia (University of Regensburg)
Udo Kruschwitz (University of Regensburg)},
    title = {Exploring the Few-Shot Performance of Low-Cost Proprietary Models in the 2024 TREC BioGen Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ur-iw},
    trec_runs = {zero-shot-gpt4o-mini, zero-shot-gemini-flash, ten-shot-gpt4o-mini, ten-shot-gemini-flash, ten-shot-gpt4o-mini-wiki, ten-shot-gemini-flash-wiki},
    trec_tracks = {biogen}
   url = {https://trec.nist.gov/pubs/trec33/papers/ur-iw.biogen.pdf}
}

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Maik Fröbe (Friedrich-Schiller-Universität)Lukas Gienapp (Leipzig University ScaDS.AI)Harrisen Scells (Universität Kassel)Eric Oliver Schmidt (Martin-Luther-Universität Halle)Matti Wiegmann (Bauhaus-Universität Weimar)Martin PotthastUniversität Kassel (Universität Kassel hessian.AI ScaDS.AI)Matthias Hagen (Friedrich-Schiller-Universität Jena)

Participant: webis
Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Runs: webis-1 | webis-2 | webis-3 | webis-gpt-1 | webis-gpt-4 | webis-gpt-6 | webis-5

Abstract

In this paper, we describe the Webis Group's participation in the 2024~edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year's submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    author = {Maik Fröbe (Friedrich-Schiller-Universität)
Lukas Gienapp (Leipzig University & ScaDS.AI)
Harrisen Scells (Universität Kassel)
Eric Oliver Schmidt (Martin-Luther-Universität Halle)
Matti Wiegmann (Bauhaus-Universität Weimar)
Martin Potthast
Universität Kassel (Universität Kassel & hessian.AI & ScaDS.AI)
Matthias Hagen (Friedrich-Schiller-Universität Jena)},
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {webis},
    trec_runs = {webis-01, webis-02, webis-03, webis-04, webis-05, webis-ag-run0-taskrag, webis-ag-run1-taskrag, webis-manual, webis-rag-run0-taskrag, webis-rag-run1-taskrag, webis-rag-run3-taskrag, webis-ag-run3-reuserag, webis-rag-run4-reuserag, webis-rag-run5-reuserag, webis-ag-run2-reuserag, webis-1, webis-2, webis-3, webis-gpt-1, webis-gpt-4, webis-gpt-6, webis-5, webis-base, webis-tot-01, webis-tot-02, webis-tot-04, webis-tot-03},
    trec_tracks = {biogen.rag.tot}
   url = {https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf}
}

Interactive Knowledge Assistance¶

Passage Query Methods for Retrieval and Reranking in Conversational Agents¶

Victor De Lima (Georgetown InfoSense)Grace Hui Yang (Georgetown InfoSense)

Participant: infosenselab
Paper: https://trec.nist.gov/pubs/trec33/papers/infosenselab.ikat.pdf
Runs: infosense_llama_pssgqrs_wghtdrerank_2 | infosense_llama_pssgqrs_wghtdrerank_1 | infosense_llama_short_long_qrs_2 | infosense_llama_short_long_qrs_3

Abstract

This paper presents our approach to the TREC Interactive Knowledge Assistance Track (iKAT), which focuses on improving conversational information-seeking (CIS) systems. While recent advancements in CIS have improved conversational agents' ability to assist users, significant challenges remain in understanding context and retrieving relevant documents across domains and dialogue turns. To address these issues, we extend the Generate-Retrieve-Generate pipeline by developing passage queries (PQs) that align with the target document's expected format to improve query-document matching during retrieval. We propose two variations of this approach: Weighted Reranking and Short and Long Passages. Each method leverages a Meta Llama model for context understanding and generating queries and responses. Passage ranking evaluation results show that the Short and Long Passages approach outperformed the organizers' baselines, performed best among Llama-based systems in the track, and achieved results comparable to GPT-4-based systems. These results indicate that the method effectively balances efficiency and performance. Findings suggest that PQs improve semantic alignment with target documents and demonstrate their potential to improve multi-turn dialogue systems.

Bibtex

@inproceedings{infosenselab-trec2024-papers-proc-1,
    author = {Victor De Lima (Georgetown InfoSense)
Grace Hui Yang (Georgetown InfoSense)},
    title = {Passage Query Methods for Retrieval and Reranking in Conversational Agents},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {infosenselab},
    trec_runs = {infosense_llama_pssgqrs_wghtdrerank_2, infosense_llama_pssgqrs_wghtdrerank_1, infosense_llama_short_long_qrs_2, infosense_llama_short_long_qrs_3},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/infosenselab.ikat.pdf}
}

NII@TREC IKAT 2024:LLM-Based Pipelines for Personalized Conversational Information Seeking¶

Xiao Fu (UCL)Navdeep Singh Bedi (USI)Praveen Acharya (DCU)Noriko Kando (NII)

Participant: nii
Paper: https://trec.nist.gov/pubs/trec33/papers/nii.ikat.pdf
Runs: nii_res_gen | nii_auto_base | nii_manu_base | nii_auto_ptkb_rr | nii_manu_ptkb_rr | NII_automatic_GeRe

Abstract

In this paper, we propose two novel pipelines—Retrieve-then-Generate (RtG) and Generate-then-Retrieve (GtR)—to enhance conversational information seeking (CIS) systems, evaluated within the TREC iKAT 2023 framework. The RtG pipeline emphasizes brevity in rewriting user utterances and generates multiple query groups to maximize the retrieval of relevant documents. This approach leads to improved recall in the final results compared to the best submission in 2023. Additionally, it incorporates a chain-of-thought methodology through a two-stage response generation process. In a zero-shot setting, the GtR pipeline introduces a hybrid approach by ensembling state-of-the-art Large Language Models (LLMs), specifically GPT-4o and Claude-3-opus. By leveraging the strengths of multiple LLMs, the GtR pipeline achieves high recall while maintaining competitive precision and ranking performance in both document retrieval and Personal Task Knowledge Base (PTKB) statement classification tasks. Our experimental results demonstrate that both pipelines significantly enhance retrieval effectiveness, offering robust solutions for future CIS systems.

Bibtex

@inproceedings{nii-trec2024-papers-proc-1,
    author = {Xiao Fu (UCL)
Navdeep Singh Bedi (USI)
Praveen Acharya (DCU)
Noriko Kando (NII)},
    title = {NII@TREC IKAT 2024:LLM-Based Pipelines for Personalized Conversational Information Seeking},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {nii},
    trec_runs = {nii_res_gen, nii_auto_base, nii_manu_base, nii_auto_ptkb_rr, nii_manu_ptkb_rr, NII_automatic_GeRe},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/nii.ikat.pdf}
}

RALI@TREC iKAT 2024: Achieving Personalization via Retrieval Fusion in Conversational Search¶

Yuchen Hui (RALI Lab, Université de Montréal) Fengran Mo (RALI Lab, Université de Montréal) Milan Mao (RALI Lab, Université de Montréal) Jian-Yun Nie (RALI Lab, Université de Montréal)

Participant: rali lab
Paper: https://trec.nist.gov/pubs/trec33/papers/rali lab.ikat.pdf
Runs: RALI_manual_monot5 | RALI_manual_rankllama | RALI_gpt4o_fusion_rerank | RALI_gpt4o_no_personalize_fusion_rerank | RALI_gpt4o_no_personalize_fusion_norerank | RALI_gpt4o_fusion_norerank

Abstract

The Recherche Appliquée en Linguistique Informatique (RALI) team participated in the 2024 TREC Interactive Knowledge Assistance (iKAT) Track. In personalized conversational search, effectively capturing a user's complex search intent requires incorporating both contextual information and key elements from the user profile into query reformulation. The user profile often contains many relevant pieces, and each could potentially complement the user's information needs. It is difficult to disregard any of them, whereas introducing an excessive number of these pieces risks drifting from the original query and hinders search performance. This is a challenge we denote as over-personalization. In this paper, we tackle the problem via employing different strategies based on fusing ranking lists generated from the queries with different levels of personalization.

Bibtex

@inproceedings{rali lab-trec2024-papers-proc-1,
    author = {Yuchen Hui (RALI Lab, Université de Montréal) 
Fengran Mo (RALI Lab, Université de Montréal) 
Milan Mao (RALI Lab, Université de Montréal) 
Jian-Yun Nie (RALI Lab, Université de Montréal)},
    title = {RALI@TREC iKAT 2024: Achieving Personalization via Retrieval Fusion in Conversational Search},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {rali lab},
    trec_runs = {RALI_gpt4o_fusion_rerank, RALI_gpt4o_no_personalize_fusion_rerank, RALI_gpt4o_no_personalize_fusion_norerank, RALI_gpt4o_fusion_norerank, RALI_manual_monot5, RALI_manual_rankllama},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/rali lab.ikat.pdf}
}

IIUoT at TREC 2024 Interactive Knowledge Assistance Track¶

Yating Zhang (University of Tsukuba)Haitao Yu (University of Tsukuba)

Participant: ii_research
Paper: https://trec.nist.gov/pubs/trec33/papers/ii_research.ikat.pdf
Runs: iiresearch_ikat2024_rag_top5_bge_reranker | iiresearch_ikat2024_rag_top5_monot5_reranker

Abstract

In conversational information-seeking (CIS), the ability to tailor responses to individual user contexts is essential for enhancing relevance and accuracy. The TREC Interactive Knowledge Assistance Track addresses this need by advancing research in personalized conversational agents that adapt dynamically to user-specific details and preferences. Our study aligns with this framework, which involves three core tasks: personal textual knowledge base (PTKB) statement ranking, passage ranking, and response generation. To address these tasks, we propose a comprehensive framework that incorporates user context at each stage. For PTKB statement ranking, we integrate embedding models with large language models (LLMs) to optimize relevance-based ranking precision, allowing for more nuanced alignment of user characteristics with retrieved information. In the passage ranking stage, our adaptive retrieval strategy combines BM25 with iterative contextual refinement, enhancing the relevance and accuracy of retrieved passages. Finally, our response generation module leverages a Retrieval-Augmented Generation (RAG) model that dynamically synthesizes user-specific context and external knowledge, producing responses that are both precise and contextually relevant. Experimental results demonstrate that our framework effectively addresses the complexities of personalized CIS, achieving notable improvements over traditional static retrieval methods.

Bibtex

@inproceedings{ii_research-trec2024-papers-proc-1,
    author = {Yating Zhang (University of Tsukuba)
Haitao Yu (University of Tsukuba)},
    title = {IIUoT at TREC 2024 Interactive Knowledge Assistance Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ii_research},
    trec_runs = {iiresearch-bm25-top10-llama3-8b-instruct, iiresearch_ikat2024_rag_top5_bge_reranker, iiresearch_trec_bio2024_t5base_run, iiresearch_ikat2024_rag_top5_monot5_reranker},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/ii_research.ikat.pdf}
}

DCU-ADAPT@TREC iKAT 2024: Incorporating Retrieved Knowledge for Enhanced Conversational Search¶

Praveen Acharya (Dublin City University)Xiao Fu (University College London)Noriko Kando (National Institute of Informatics)Gareth J. F. Jones (Dublin City University)

Participant: DCU-ADAPT
Paper: https://trec.nist.gov/pubs/trec33/papers/DCU-ADAPT.ikat.pdf
Runs: dcu_manual_qe_summ_TopP_3 | dcu_manual_qe_summ_ptkb_TopP_3 | dcu_auto_qe_key_topP-50_topK-5 | dcu_auto_qre_sim | dcu_auto_qe_summ_TopP_3 | dcu_auto_qe_summ_ptkb_TopP_

Abstract

Users of search applications often encounter difficulties in expressing their information needs effectively. Conversational search (CS) can potentially support users in creating effective queries by enabling a multi-turn, iterative dialogue between a User and the search System. These dialogues help users to refine and build their understanding of their information need through a series of query-response exchanges. However, current CS systems generally do not accumulate knowledge about the user's information needs or the content with which they have engaged during this dialogue. This limitation can hinder the system's ability to support users effectively. To address this issue, we propose an approach that seeks to model and utilize knowledge gained from each interaction to enhance future user queries. Our method focuses on incorporating knowledge from retrieved documents to enrich subsequent user queries, ultimately improving query comprehension and retrieval outcomes. We test the effectiveness of our proposed approach in our TREC iKAT 2024 participation.

Bibtex

@inproceedings{DCU-ADAPT-trec2024-papers-proc-1,
    author = {Praveen Acharya (Dublin City University)
Xiao Fu (University College London)
Noriko Kando (National Institute of Informatics)
Gareth J. F. Jones (Dublin City University)},
    title = {DCU-ADAPT@TREC iKAT 2024: Incorporating Retrieved Knowledge for Enhanced Conversational Search},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {DCU-ADAPT},
    trec_runs = {dcu_manual_qe_summ_TopP_3, dcu_manual_qe_summ_ptkb_TopP_3, dcu_auto_qe_key_topP-50_topK-5, dcu_auto_qre_sim, dcu_auto_qe_summ_TopP_3, dcu_auto_qe_summ_ptkb_TopP_},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/DCU-ADAPT.ikat.pdf}
}

IRLab@iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search¶

Simon Lupart (University of Amsterdam)Zahra Abbasiantaeb (University of Amsterdam)Mohammad Aliannejadi (University of Amsterdam)

Participant: uva
Paper: https://trec.nist.gov/pubs/trec33/papers/uva.ikat.pdf
Runs: manual-splade-fusion | manual-splade-debertav3 | gpt4-MQ-debertav3 | gpt4-mq-rr-fusion | gpt-single-QR-rr-debertav3 | qd1

Abstract

The Interactive Knowledge Assistant Track (iKAT) 2024 focuses on advancing conversational assistants, able to adapt their interaction and responses from personalized user knowledge. The track incorporates a Personal Textual Knowledge Base (PTKB) alongside Conversational AI tasks, such as passage ranking and response generation. Query Rewrite being an effective approach for resolving conversational context, we explore Large Language Models (LLMs), as query rewriters. Specifically, our submitted runs explore multi-aspect query generation using the MQ4CS framework, which we further enhance with Learned Sparse Retrieval via the SPLADE architecture, coupled with robust cross-encoder models. We also propose an alternative to the previous interleaving strategy, aggregating multiple aspects during the reranking phase. Our findings indicate that multi-aspect query generation is effective in enhancing performance when integrated with advanced retrieval and reranking models. Our results also lead the way for better personalization in Conversational Search, relying on LLMs to integrate personalization within query rewrite, and outperforming human rewrite performance.

Bibtex

@inproceedings{uva-trec2024-papers-proc-1,
    author = {Simon Lupart (University of Amsterdam)
Zahra Abbasiantaeb (University of Amsterdam)
Mohammad Aliannejadi (University of Amsterdam)},
    title = {IRLab@iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {uva},
    trec_runs = {gpt4-MQ-debertav3, gpt4-mq-rr-fusion, gpt-single-QR-rr-debertav3, qd1, manual-splade-fusion, manual-splade-debertav3},
    trec_tracks = {ikat}
   url = {https://trec.nist.gov/pubs/trec33/papers/uva.ikat.pdf}
}

Lateral Reading¶

Monster Ranking¶

Charles L. A. Clarke (University of Waterloo)Siqing Huo (University of Waterloo)Negar Arabzadeh (University of Waterloo)

Participant: WaterlooClarke
Paper: https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf
Runs: uwclarke_auto | uwclarke_auto_summarized | UWClarke_rerank

Abstract

Participating as the UWClarke group, we focused on the RAG track; we also submitted runs for the Lateral Reading Track. For the retrieval task (R) of the RAG Track, we attempted what we have come to call “monster ranking”. Largely ignoring cost and computational resources, monster ranking attempts to determine the best possible ranked list for a query by whatever means possible, including explicit LLM-based relevance judgments, both pointwise and pairwise. While a monster ranker could never be deployed in a production environment, its output may be valuable for evaluating cheaper and faster rankers. For the full retrieval augmented generation (RAG) task we explored two general approaches, depending on if generation happens first or second: 1) Generate an Answer and support with Retrieved Evidence (GARE). 2) Retrieve And Generate with Evidence (RAGE).

Bibtex

@inproceedings{WaterlooClarke-trec2024-papers-proc-1,
    author = {Charles L. A. Clarke (University of Waterloo)
Siqing Huo (University of Waterloo)
Negar Arabzadeh (University of Waterloo)},
    title = {Monster Ranking},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {WaterlooClarke},
    trec_runs = {uwclarke_auto, uwclarke_auto_summarized, UWCrag, UWCrag_stepbystep, UWCgarag, monster, uwc1, uwc2, uwc0, uwcCQAR, uwcCQA, uwcCQR, uwcCQ, uwcBA, uwcBQ, UWClarke_rerank},
    trec_tracks = {lateral.rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf}
}

Medical Video Question Answering¶

Doshisha University, Universität zu Lübeck and German Research Center for Artificial Intelligence at TRECVID 2024: QFISC Task¶

Zihao Chen (Doshisha University)Falco Lentzsch (German Research Center for Artificial Intelligence)Nele S. Brügge (German Research Center for Artificial Intelligence)Frédéric Li (German Research Center for Artificial Intelligence)Miho Ohsaki (Doshisha University)Heinz Handels (German Research Center for Artificial Intelligence, University of Luebeck)Marcin Grzegorzek (German Research Center for Artificial Intelligence, University of Luebeck)Kimiaki Shirahama (Doshisha University)

Participant: DoshishaUzlDfki
Paper: https://trec.nist.gov/pubs/trec33/papers/DoshishaUzlDfki.medvidqa.pdf
Runs: chatGPT_zeroshot_prompt | mistral_meta_prompt | mistral_fewshot_prompt | GPT_meta_prompt | CoSeg_meta_prompt

Abstract

This paper presents the approaches proposed by the DoshishaUzlDfki team to address the Query-Focused Instructional Step Captioning (QFISC) task of TRECVID 2024. Given some RGB videos containing stepwise instructions, we explored several techniques to automatically identify the boundaries of each step, and provide a caption to it. More specifically, two different types of methods were investigated for temporal video segmentation. The first uses the CoSeg approach proposed by Wang et al. [9] based on Event Segmentation Theory, which hypothesises that video frames at the boundaries of steps are harder to predict since they tend to contain more significant visual changes. In detail, CoSeg detects event boundaries in the RGB video stream by finding the local maxima in the reconstruction error of a model trained to reconstruct the temporal contrastive embeddings of video snippets. The second type of approaches we tested exclusively relies on the audio modality, and is based on the hypothesis that information about step transitions is often semantically contained in the verbal transcripts of the videos. In detail, we used the WhisperX model [3] that isolates speech parts in the audio tracks of the videos, and converts them into timestamped text transcripts. The latter were then sent as input of a Large Language Model (LLM) with a carefully designed prompt requesting the LLM to identify step boundaries. Once the temporal video segmentation performed, we

sent the WhisperX transcripts corresponding to the video segments determined by both methods to a LLM instructed to caption them. The GPT4o and Mistral Large 2 LLMs were employed in our experiments for both segmentation and captioning. Our results show that the temporal segmentation methods based on audioprocessing significantly outperform the video-based one. More specifically, the best performances we obtained are yielded by our approach using GPT4o with zero-shot prompting for temporal segmentation. It achieves the top global performances of all runs submitted to the QFISC task in all evaluation metrics, except for precision whose best performance is obtained by our run using Mistral Large 2 with chain-of-thoughts prompting.

Bibtex

@inproceedings{DoshishaUzlDfki-trec2024-papers-proc-1,
    author = {Zihao Chen (Doshisha University)
Falco Lentzsch (German Research Center for Artificial Intelligence)
Nele S. Brügge (German Research Center for Artificial Intelligence)
Frédéric Li (German Research Center for Artificial Intelligence)
Miho Ohsaki (Doshisha University)
Heinz Handels (German Research Center for Artificial Intelligence, University of Luebeck)
Marcin Grzegorzek (German Research Center for Artificial Intelligence, University of Luebeck)
Kimiaki Shirahama (Doshisha University)},
    title = {Doshisha University, Universität zu Lübeck and German Research Center for Artificial Intelligence at TRECVID 2024: QFISC Task},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {DoshishaUzlDfki},
    trec_runs = {chatGPT_zeroshot_prompt, mistral_meta_prompt, mistral_fewshot_prompt, GPT_meta_prompt, CoSeg_meta_prompt},
    trec_tracks = {medvidqa}
   url = {https://trec.nist.gov/pubs/trec33/papers/DoshishaUzlDfki.medvidqa.pdf}
}

NeuCLIR¶

Plain-Language Adaptation of Biomedical Abstracts¶

Enhancing Accessibility of Medical Texts through Large Language Model-Driven Plain Language Adaptation¶

Ting-Wei Chang (Department of Computer Science, Information Engineering, National Taiwan University, Taiwan) Hen-Hsen Huang (Institute of Information Science, Academia Sinica, Taiwan) Hsin-Hsi Chen (Department of Computer Science, Information Engineering, National Taiwan University, Taiwan, AI Research Center (AINTU), National Taiwan University, Taiwan)

Participant: ntu_nlp
Paper: https://trec.nist.gov/pubs/trec33/papers/ntu_nlp.plaba.pdf
Runs: gemini-1.5-pro_demon5_replace-demon5 | gemini-1.5-flash_demon5_replace-demon5 | gpt-4o-mini _demon5_replace-demon5 | task2_moa_tier3_post | task2_moa_tier1_post | task2_moa_tier2_post

Abstract

This paper addresses the challenge of making complex healthcare information more accessible through automated Plain Language Adaptation (PLA). PLA aims to simplify technical medical language, bridging a critical gap between the complexity of healthcare texts and patients’ reading comprehension. Recent advances in Large Language Models (LLMs), such as GPT and BART, have opened new possibilities for PLA, especially in zero-shot and few-shot learning contexts where task-specific data is limited. In this work, we leverage the capabilities of LLMs such as GPT-4o-mini, Gemini-1.5-pro, and LLaMA for text simplification. Additionally, we incorporate Mixture-of-Agents (MoA) techniques to enhance adaptability and robustness in PLA tasks. Key contributions include a comparative analysis of prompting strategies, finetuning with QLoRA on different LLMs, and the integration of MoA technique. Our findings demonstrate the effectiveness of LLM-driven PLA, showcasing its potential in making healthcare information more comprehensible while preserving essential content.

Bibtex

@inproceedings{ntu_nlp-trec2024-papers-proc-1,
    author = {Ting-Wei Chang (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan) 
Hen-Hsen Huang (Institute of Information Science, Academia Sinica, Taiwan) 
Hsin-Hsi Chen (Department of Computer Science and Information Engineering, National Taiwan University, Taiwan, AI Research Center (AINTU), National Taiwan University, Taiwan)},
    title = {Enhancing Accessibility of Medical Texts through Large Language Model-Driven Plain Language Adaptation},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ntu_nlp},
    trec_runs = {task2_moa_tier3_post, task2_moa_tier1_post, task2_moa_tier2_post, gemini-1.5-pro_demon5_replace-demon5, gemini-1.5-flash_demon5_replace-demon5, gpt-4o-mini _demon5_replace-demon5},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/ntu_nlp.plaba.pdf}
}

MaLei at the PLABA Track of TAC-2024: RoBERTa for Task 1 – LLaMA3.1 and GPT-4o for Task 2¶

Zhidong Ling, Zhihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

Participant: UM
Paper: https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf
Runs: Roberta-base | GPT | LLaMa 3.1 70B instruction (2nd run)

Abstract

This report is the system description of the \textsc{MaLei} team (\textbf{Manchester} and \textbf{Leiden}) for the shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following the last year).

This report contains two sections corresponding to the two sub-tasks in PLABA-2024. In task one, we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon, and acronyms in the biomedical abstracts and reported the F1 score. Due to time constraints, we didn't finish the replacement task. In task two, we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA. From the official Evaluation from PLABA-2024 on Task 1A and 1B, our \textbf{much smaller fine-tuned RoBERTa-Base} model ranked 3rd and 2nd respectively on the two sub-tasks, and the \textbf{1st on averaged F1 scores across the two tasks} from 9 evaluated systems. Our LLaMA-3.1-70B-instructed model achieved the \textbf{highest Completeness} score for Task-2. We share our source codes, fine-tuned models, and related resources at \url{https://github.com/HECTA-UoM/PLABA-2024}

Bibtex

@inproceedings{UM-trec2024-papers-proc-1,
    author = {Zhidong Ling, Zhihao Li, Pablo Romero, Lifeng Han, Goran Nenadic},
    title = {{MaLei} at the PLABA Track of TAC-2024: RoBERTa for Task 1 -- LLaMA3.1 and GPT-4o for Task 2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {UM},
    trec_runs = {GPT, LLaMa 3.1 70B instruction (2nd run), Roberta-base},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf}
}

MaLei at the PLABA Track of TAC-2024: RoBERTa for Task 1 – LLaMA3.1 and GPT-4o for Task 2¶

Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

Participant: UM
Paper: https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf
Runs: Roberta-base | GPT | LLaMa 3.1 70B instruction (2nd run)

Abstract

This report is the system description of the \textsc{MaLei} team (\textbf{Manchester} and \textbf{Leiden}) for shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following last year).

This report contains two sections corresponding to the two sub-tasks in PLABA-2024. In task one, we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon and acronyms in the biomedical abstracts and reported the F1 score. Due to time constraints, we didn't finish the replacement task. In task two, we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA. From the official Evaluation from PLABA-2024 on Task 1A and 1B, our \textbf{much smaller fine-tuned RoBERTa-Base} model ranked 3rd and 2nd respectively on the two sub-tasks, and the \textbf{1st on averaged F1 scores across the two tasks} from 9 evaluated systems. Our LLaMA-3.1-70B-instructed model achieved the \textbf{highest Completeness} score for Task-2. We share our source codes, fine-tuned models, and related resources at \url{https://github.com/HECTA-UoM/PLABA2024}

Bibtex

@inproceedings{UM-trec2024-papers-proc-2,
    author = {Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic},
    title = {{MaLei} at the PLABA Track of TAC-2024: RoBERTa for Task 1 -- LLaMA3.1 and GPT-4o for Task 2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {UM},
    trec_runs = {GPT, LLaMa 3.1 70B instruction (2nd run), Roberta-base},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf}
}

UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text¶

Primoz Kocbek (University of Maribor)Leon Kopitar (University of Maribor)Zhihong Zhang (Columbia University)Emirhan Aydın (Manisa Celal Bayar University)Maxim Topaz (Columbia University)Gregor Stiglic (University of Maribor)

Participant: um_fhs
Paper: https://trec.nist.gov/pubs/trec33/papers/um_fhs.plaba.pdf
Runs: plaba_um_fhs_sub1 | plaba_um_fhs_sub2 | plaba_um_fhs_sub3

Abstract

This paper describes our submissions to the TREC 2024 PLABA track with the aim to simplify biomedical abstracts for a K8-level audience (13–14 years old students). We tested three approaches using OpenAI’s gpt-4o and gpt-4o-mini models: baseline prompt engineering, a two-AI agent approach, and fine-tuning. Adaptations were evaluated using qualitative metrics (5-point Likert scales for simplicity, accuracy, completeness, and brevity) and quantitative readability scores (Flesch-Kincaid grade level, SMOG Index). Results indicated that the two-agent approach and baseline prompt engineering with gpt-4o-mini models show superior qualitative performance, while fine-tuned models excelled in accuracy and completeness but were less simple. The evaluation results demonstrated that prompt engineering with gpt-4o-mini outperforms iterative improvement strategies via two-agent approach as well as fine-tuning with gpt-4o. We intend to expand our investigation of the results and explore advanced evaluations.

Bibtex

@inproceedings{um_fhs-trec2024-papers-proc-1,
    author = {Primoz Kocbek (University of Maribor)
Leon Kopitar (University of Maribor)
Zhihong Zhang (Columbia University)
Emirhan Aydın (Manisa Celal Bayar University)
Maxim Topaz (Columbia University)
Gregor Stiglic (University of Maribor)},
    title = {UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {um_fhs},
    trec_runs = {plaba_um_fhs_sub1, plaba_um_fhs_sub2, plaba_um_fhs_sub3},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/um_fhs.plaba.pdf}
}

MaLei at the PLABA Track of TREC 2024: RoBERTa for Term Replacement – LLaMA3.1 and GPT-4o for Complete Abstract Adaptation¶

Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

Participant: UM
Paper: https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf
Runs: Roberta-base | GPT | LLaMa 3.1 70B instruction (2nd run)

Abstract

This report is the system description of the \textsc{MaLei} team (\textbf{Manchester} and \textbf{Leiden}) for shared task Plain Language Adaptation of Biomedical Abstracts (PLABA) 2024 (we had an earlier name BeeManc following last year).

This report contains two sections corresponding to the two sub-tasks in PLABA-2024. In task one, we applied fine-tuned ReBERTa-Base models to identify and classify the difficult terms, jargon and acronyms in the biomedical abstracts and reported the F1 score. Due to time constraints, we didn't finish the replacement task. In task two, we leveraged Llamma3.1-70B-Instruct and GPT-4o with the one-shot prompts to complete the abstract adaptation and reported the scores in BLEU, SARI, BERTScore, LENS, and SALSA. From the official Evaluation from PLABA-2024 on Task 1A and 1B, our \textbf{much smaller fine-tuned RoBERTa-Base} model ranked 3rd and 2nd respectively on the two sub-tasks, and the \textbf{1st on averaged F1 scores across the two tasks} from 9 evaluated systems. Our LLaMA-3.1-70B-instructed model achieved the \textbf{highest Completeness} score for Task-2. We share our source codes, fine-tuned models, and related resources at \url{https://github.com/HECTA-UoM/PLABA2024}

Bibtex

@inproceedings{UM-trec2024-papers-proc-3,
    author = {Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic},
    title = {{MaLei} at the PLABA Track of TREC 2024: RoBERTa for Term Replacement -- LLaMA3.1 and GPT-4o for Complete Abstract Adaptation},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {UM},
    trec_runs = {GPT, LLaMa 3.1 70B instruction (2nd run), Roberta-base},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf}
}

Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries¶

Jan Bakker (University of Amsterdam)Taiki Papandreou-Lazos (University of Amsterdam)Jaap Kamps (University of Amsterdam)

Participant: UAmsterdam
Paper: https://trec.nist.gov/pubs/trec33/papers/UAmsterdam.plaba.pdf
Runs: UAms-ConBART-Cochrane | UAms-BART-Cochrane

Abstract

This paper documents the University of Amsterdam’s participation in the TREC 2024 Plain Language Adaptation of Biomedical Abstracts (PLABA) Track. We investigated the effectiveness of text simplification models trained on aligned pairs of sentences in biomedical abstracts and plain language summaries. We participated in Task 2 on Complete Abstract Adaptation and conducted post-submission experiments in Task 1 on Term Replacement. Our main findings are the following. First, we used text simplification models trained on aligned real-world scientific abstracts and plain language summaries. We observed better performance for the context-aware model relative to the sentence-level model. Second, our experiments show the value of training on external corpora and demonstrate very reasonable out-of-domain performance on the PLABA data. Third, more generally, our models are conservative and cautious in gratuitous edits or information insertions. This approach ensures the fidelity of the generated output and limits the risk of overgeneration or hallucination.

Bibtex

@inproceedings{UAmsterdam-trec2024-papers-proc-2,
    author = {Jan Bakker (University of Amsterdam)
Taiki Papandreou-Lazos (University of Amsterdam)
Jaap Kamps (University of Amsterdam)},
    title = {Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {UAmsterdam},
    trec_runs = {UAms-ConBART-Cochrane, UAms-BART-Cochrane},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/UAmsterdam.plaba.pdf}
}

Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries¶

Jan Bakker (University of Amsterdam)Taiki Papandreou-Lazos (University of Amsterdam)Jaap Kamps (University of Amsterdam)

Participant: UAmsterdam
Paper: https://trec.nist.gov/pubs/trec33/papers/UAmsterdam.plaba.pdf
Runs: UAms-ConBART-Cochrane | UAms-BART-Cochrane

Abstract

This paper documents the University of Amsterdam’s participation in the TREC 2024 Plain Language Adaptation of Biomedical Abstracts (PLABA) Track. We investigated the effectiveness of text simplification models trained on aligned pairs of sentences in biomedical abstracts and plain language summaries. We participated in Task 2 on Complete Abstract Adaptation and conducted post-submission experiments in Task 1 on Term Replacement. Our main findings are the following. First, we used text simplification models trained on aligned real-world scientific abstracts and plain language summaries. We observed better performance for the context-aware model relative to the sentence-level model. Second, our experiments show the value of training on external corpora and demonstrate very reasonable out-of-domain performance on the PLABA data. Third, more generally, our models are conservative and cautious in gratuitous edits or information insertions. This approach ensures the fidelity of the generated output and limits the risk of overgeneration or hallucination.

Bibtex

@inproceedings{UAmsterdam-trec2024-papers-proc-3,
    author = {Jan Bakker (University of Amsterdam)
Taiki Papandreou-Lazos (University of Amsterdam)
Jaap Kamps (University of Amsterdam)},
    title = {Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {UAmsterdam},
    trec_runs = {UAms-ConBART-Cochrane, UAms-BART-Cochrane},
    trec_tracks = {plaba}
   url = {https://trec.nist.gov/pubs/trec33/papers/UAmsterdam.plaba.pdf}
}

Product Search¶

JBNU at TREC 2024 Product Search Track¶

Gi-taek An (Jeonbuk National University)Seong-Hyuk Yim (Jeonbuk National University)Jun-Yong Park (Jeonbuk National University)Woo-Seok Choi (Jeonbuk National University)Kyung-Soon Lee (Jeonbuk National University)

Participant: jbnu
Paper: https://trec.nist.gov/pubs/trec33/papers/jbnu.product.pdf
Runs: jbnu08 | jbnu04 | jbnu09 | jbnu01 | jbnu07 | jbnu10 | jbnu03 | jbnu02 | jbnu11 | jbnu12 | jbnu05 | jbnu06

Abstract

This paper describes the participation of the jbnu team in the TREC 2024 Product Search Track. This study addresses two key challenges in product search related to sparse and dense retrieval models. For sparse retrieval models, we propose modifying the activation function to GELU to filter out products that, despite being retrieved due to token expansion, are irrelevant for recommendation based on the scoring mechanism. For dense retrieval models, product search document indexing data was generated using the generative model T5 to address input token limitations. Experimental results demonstrate that both proposed methods yield performance improvements over baseline models.

Bibtex

@inproceedings{jbnu-trec2024-papers-proc-1,
    author = {Gi-taek An (Jeonbuk National University)
Seong-Hyuk Yim (Jeonbuk National University)
Jun-Yong Park (Jeonbuk National University)
Woo-Seok Choi (Jeonbuk National University)
Kyung-Soon Lee (Jeonbuk National University)},
    title = {JBNU at TREC 2024 Product Search Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {jbnu},
    trec_runs = {jbnu08, jbnu04, jbnu09, jbnu01, jbnu07, jbnu10, jbnu03, jbnu02, jbnu11, jbnu12, jbnu05, jbnu06},
    trec_tracks = {product}
   url = {https://trec.nist.gov/pubs/trec33/papers/jbnu.product.pdf}
}

Retrieval-Augmented Generation¶

TREMA-UNH at TREC: RAG Systems and RUBRIC-style Evaluation¶

Naghmeh FarziLaura Dietz

Participant: TREMA-UNH
Paper: https://trec.nist.gov/pubs/trec33/papers/TREMA-UNH.rag.pdf
Runs: Ranked_Iterative_Fact_Extraction_and_Refinement | Enhanced_Iterative_Fact_Refinement_and_Prioritization | Ranked_Iterative_Fact_Extraction_and_Refinement_RIFER_-_bm25

Abstract

The TREMA-UNH team participated in the TREC Retrieval-Augmented Genera-

tion track (RAG). In Part 1 we describe the RAG systems submitted to the Augmented Generation Task (AG) and the Retrieval-Augmented Generation Task (RAG), the lat- ter using a BM25 retrieval model. In Part 2 we describe an alternative LLM-based evaluation method for this track using the RUBRIC Autograder Workbench approach, which won the SIGIR’24 best paper award.

Bibtex

@inproceedings{TREMA-UNH-trec2024-papers-proc-1,
    author = {Naghmeh Farzi
Laura Dietz},
    title = {TREMA-UNH at TREC: RAG Systems and RUBRIC-style Evaluation},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {TREMA-UNH},
    trec_runs = {Ranked_Iterative_Fact_Extraction_and_Refinement, Enhanced_Iterative_Fact_Refinement_and_Prioritization, Ranked_Iterative_Fact_Extraction_and_Refinement_RIFER_-_bm25},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/TREMA-UNH.rag.pdf}
}

CIR at TREC 2024 RAG: Task 2 - Augmented Generation with Diversified Segments and Knowledge Adaption¶

Jüri Keller (TH Köln - University of Applied) Björn Engelmann (TH Köln - University of Applied) Fabian Haak (TH Köln - University of Applied) Philipp Schaer (TH Köln - University of Applied) Hermann Kroll (TU Braunschweig) Christin Katharina Kreutz (TH Mittelhessen - University of Applied Sciences, Herder Institute)

Abstract

This paper describes the CIR team’s participation in the TREC 2024 RAG track for task 2, augmented generation. With our approach, we intended to explore the effects of diversification of the segments that are considered in the generation as well as variations in the depths of users’ knowledge on a query topic. We describe a two-step approach that first reranks input segments such that they are as similar as possible to a query while also being as dissimilar as possible from higher ranked relevant segments. In the second step, these reranked segments are relayed to an LLM, which uses them to generate an answer to the query while referencing the segments that have contributed to specific parts of the answer. The LLM considers the varying background knowledge of potential users through our prompts.

Bibtex

@inproceedings{CIR-trec2024-papers-proc-1,
    author = {Jüri Keller (TH Köln - University of Applied) 
Björn Engelmann (TH Köln - University of Applied) 
Fabian Haak (TH Köln - University of Applied) 
Philipp Schaer (TH Köln - University of Applied) 
Hermann Kroll (TU Braunschweig) 
Christin Katharina Kreutz (TH Mittelhessen - University of Applied Sciences, Herder Institute)},
    title = {CIR at TREC 2024 RAG: Task 2 - Augmented Generation with Diversified Segments and Knowledge Adaption},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {CIR},
    trec_runs = {cir_gpt-4o-mini_Jaccard_50_0.5_100_301_p0, cir_gpt-4o-mini_Jaccard_50_1.0_100_301_p0, cir_gpt-4o-mini_Cosine_50_0.5_100_301_p1, cir_gpt-4o-mini_Cosine_50_0.25_100_301_p1, cir_gpt-4o-mini_Cosine_50_0.75_100_301_p1, cir_gpt-4o-mini_Cosine_50_1.0_100_301_p1, cir_gpt-4o-mini_Cosine_20_0.5_100_301_p1, cir_gpt-4o-mini_Cosine_50_0.5_100_301_p2, cir_gpt-4o-mini_Cosine_50_0.5_100_301_p3, cir_gpt-4o-mini_no_reranking_50_0.5_100_301_p1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/CIR.rag.pdf}
}

Monster Ranking¶

Charles L. A. Clarke (University of Waterloo)Siqing Huo (University of Waterloo)Negar Arabzadeh (University of Waterloo)

Participant: WaterlooClarke
Paper: https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf
Runs: monster | uwc1 | uwc2 | uwc0 | uwcCQAR | uwcCQA | uwcCQR | uwcCQ | uwcBA | uwcBQ | UWCrag | UWCrag_stepbystep | UWCgarag

Abstract

Participating as the UWClarke group, we focused on the RAG track; we also submitted runs for the Lateral Reading Track. For the retrieval task (R) of the RAG Track, we attempted what we have come to call “monster ranking”. Largely ignoring cost and computational resources, monster ranking attempts to determine the best possible ranked list for a query by whatever means possible, including explicit LLM-based relevance judgments, both pointwise and pairwise. While a monster ranker could never be deployed in a production environment, its output may be valuable for evaluating cheaper and faster rankers. For the full retrieval augmented generation (RAG) task we explored two general approaches, depending on if generation happens first or second: 1) Generate an Answer and support with Retrieved Evidence (GARE). 2) Retrieve And Generate with Evidence (RAGE).

Bibtex

@inproceedings{WaterlooClarke-trec2024-papers-proc-1,
    author = {Charles L. A. Clarke (University of Waterloo)
Siqing Huo (University of Waterloo)
Negar Arabzadeh (University of Waterloo)},
    title = {Monster Ranking},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {WaterlooClarke},
    trec_runs = {uwclarke_auto, uwclarke_auto_summarized, UWCrag, UWCrag_stepbystep, UWCgarag, monster, uwc1, uwc2, uwc0, uwcCQAR, uwcCQA, uwcCQR, uwcCQ, uwcBA, uwcBQ, UWClarke_rerank},
    trec_tracks = {lateral.rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf}
}

softbank-meisei-trec2024-papers-proc-2¶

Aiswariya Manoj Kumar（Softbank Corp.）Hiroki Takushima（Softbank Corp.）Yuma Suzuki（Softbank Corp.）Hayato Tanoue（Softbank Corp.）Hiroki Nishihara（Softbank Corp.）Yuki Shibata（Softbank Corp.）Haruki Sato（Agoop Corp.）Takumi Takada（SB Intuitions Corp.）Takayuki Hori（Softbank Corp.）Kazuya Ueki（Meisei Univ.）

Participant: softbank-meisei
Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf
Runs: rtask-bm25-colbert_faiss | rtask-bm25-rank_zephyr | agtask-bm25-colbert_faiss-gpt4o-llama70b | rag_bm25-colbert_faiss-gpt4o-llama70b | ragtask-bm25-rank_zephyr-gpt4o-llama70b

Abstract

The SoftBank-Meisei team participated in the Retrieval (R), Augmented Generation (AG), and Retrieval Augmented Generation (RAG) tasks at TREC RAG 2024. In the retrieval task, we employed the hierarchical retrieval process of combining the sparse and dense retrieval methods. We submitted two runs for the task; one with the baseline implementation with additional preprocessing on the topic list and the other with the hierarchical retrieval results.

In the Augmented Generation task, we used the GPT-4o API, as well as the LLama3-70b model along with our custom prompt for the generation. As for the Retrieval Augmented Generation task, we submitted two runs same as the R-task. The prompt used for the AG-task was used for the generation stage of the RAG-task too.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    author = {Aiswariya Manoj Kumar（Softbank Corp.）
Hiroki Takushima（Softbank Corp.）
Yuma Suzuki（Softbank Corp.）
Hayato Tanoue（Softbank Corp.）
Hiroki Nishihara（Softbank Corp.）
Yuki Shibata（Softbank Corp.）
Haruki Sato（Agoop Corp.）
Takumi Takada（SB Intuitions Corp.）
Takayuki Hori（Softbank Corp.）
Kazuya Ueki（Meisei Univ.）},
    title = {softbank-meisei-trec2024-papers-proc-2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf}
}

Laboratory for Analytic Sciences in TREC 2024 Retrieval Augmented Generation Track¶

Yue Wang (UNC at Chapel Hill)John M. Conroy (IDA Center for Computing Sciences)Neil Molino (IDA Center for Computing Sciences)Julia Yang (U.S. Department of Defense)Mike Green (U.S. Department of Defense)

Participant: ncsu-las
Paper: https://trec.nist.gov/pubs/trec33/papers/ncsu-las.rag.pdf
Runs: LAS_ENN_T5_RERANKED_MXBAI | LAS-splade-mxbai-rrf | LAS-splade-mxbai | LAS_enn_t5 | LAS_ann_t5_qdrant | LAS-splade-mxbai-rrf-mmr8 | LAS-splade-mxbai-mmr8-RAG | LAS-T5-mxbai-mmr8-RAG | LAS-splade-mxbai-rrf-mmr8-doc | LAS_splad_mxbai-rrf-occams_50_RAG

Abstract

We report on our approach to the NIST TREC 2024 retrieval-augmented generation (RAG) track. The goal of this track was to build and evaluate systems that can answer complex questions by 1) retrieving excerpts of webpages from a large text collection (hundreds of millions of excerpts taken from tens of millions of webpages); 2) summarizing relevant information within retrieved excerpts into an answer containing up to 400 words; 3) attributing each sentence in the generated summary to one or more retrieved excerpts. We participated in the retrieval (R) task and retrieval augmented generation (RAG) task.

Bibtex

@inproceedings{ncsu-las-trec2024-papers-proc-1,
    author = {Yue Wang (UNC at Chapel Hill)
John M. Conroy (IDA Center for Computing Sciences)
Neil Molino (IDA Center for Computing Sciences)
Julia Yang (U.S. Department of Defense)
Mike Green (U.S. Department of Defense)},
    title = {Laboratory for Analytic Sciences in TREC 2024 Retrieval Augmented Generation Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ncsu-las},
    trec_runs = {LAS_ENN_T5_RERANKED_MXBAI, LAS-splade-mxbai-rrf, LAS-splade-mxbai, LAS-splade-mxbai-rrf-mmr8, LAS-splade-mxbai-mmr8-RAG, LAS-T5-mxbai-mmr8-RAG, LAS_enn_t5, LAS_ann_t5_qdrant, LAS-splade-mxbai-rrf-mmr8-doc, LAS_splad_mxbai-rrf-occams_50_RAG},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/ncsu-las.rag.pdf}
}

The University of Stavanger (IAI) at the TREC 2024 Retrieval-Augmented Generation Track¶

Weronika Lajewska (University of Stavanger)Krisztian Balog (University of Stavanger)

Participant: uis-iai
Paper: https://trec.nist.gov/pubs/trec33/papers/uis-iai.rag.pdf
Runs: ginger_top_5 | baseline_top_5 | ginger-fluency_top_5 | ginger-fluency_top_10 | ginger-fluency_top_20

Abstract

This paper describes the participation of the IAI group at the University of Stavanger in the TREC 2024 Retrieval-Augmented Generation track. We employ a modular pipeline for Grounded Information Nugget-based GEneration of Conversational Information-Seeking Responses (GINGER) to ensure factual correctness and source attribution. The multistage process includes detecting, clustering, and ranking information nuggets, summarizing top clusters, and generating follow-up questions based on uncovered subspaces of relevant information. In our runs, we experiment with different length of the responses and different number of input passages. Preliminary results indicate that ours was one of the top performing systems in the augmented generation task.

Bibtex

@inproceedings{uis-iai-trec2024-papers-proc-1,
    author = {Weronika Lajewska (University of Stavanger)
Krisztian Balog (University of Stavanger)},
    title = {The University of Stavanger (IAI) at the TREC 2024 Retrieval-Augmented Generation Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {uis-iai},
    trec_runs = {ginger_top_5, baseline_top_5, ginger-fluency_top_5, ginger-fluency_top_10, ginger-fluency_top_20},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/uis-iai.rag.pdf}
}

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Maik Fröbe (Friedrich-Schiller-Universität)Lukas Gienapp (Leipzig University ScaDS.AI)Harrisen Scells (Universität Kassel)Eric Oliver Schmidt (Martin-Luther-Universität Halle)Matti Wiegmann (Bauhaus-Universität Weimar)Martin PotthastUniversität Kassel (Universität Kassel hessian.AI ScaDS.AI)Matthias Hagen (Friedrich-Schiller-Universität Jena)

Participant: webis
Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Runs: webis-01 | webis-02 | webis-03 | webis-04 | webis-05 | webis-ag-run0-taskrag | webis-ag-run1-taskrag | webis-ag-run3-reuserag | webis-ag-run2-reuserag | webis-manual | webis-rag-run0-taskrag | webis-rag-run1-taskrag | webis-rag-run3-taskrag | webis-rag-run4-reuserag | webis-rag-run5-reuserag

Abstract

In this paper, we describe the Webis Group's participation in the 2024~edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year's submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    author = {Maik Fröbe (Friedrich-Schiller-Universität)
Lukas Gienapp (Leipzig University & ScaDS.AI)
Harrisen Scells (Universität Kassel)
Eric Oliver Schmidt (Martin-Luther-Universität Halle)
Matti Wiegmann (Bauhaus-Universität Weimar)
Martin Potthast
Universität Kassel (Universität Kassel & hessian.AI & ScaDS.AI)
Matthias Hagen (Friedrich-Schiller-Universität Jena)},
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {webis},
    trec_runs = {webis-01, webis-02, webis-03, webis-04, webis-05, webis-ag-run0-taskrag, webis-ag-run1-taskrag, webis-manual, webis-rag-run0-taskrag, webis-rag-run1-taskrag, webis-rag-run3-taskrag, webis-ag-run3-reuserag, webis-rag-run4-reuserag, webis-rag-run5-reuserag, webis-ag-run2-reuserag, webis-1, webis-2, webis-3, webis-gpt-1, webis-gpt-4, webis-gpt-6, webis-5, webis-base, webis-tot-01, webis-tot-02, webis-tot-04, webis-tot-03},
    trec_tracks = {biogen.rag.tot}
   url = {https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf}
}

Tip-of-the-Tongue¶

IISERK@ToT_2024: Query Reformulation and Layered Retrieval for Tip-of-Tongue Items¶

Subinay Adhikary (IISER-K),Shuvam Banerji Seal (IISER-K),Soumyadeep Sar (IISER-K),Dwaipayan Roy (IISER-K)

Participant: IISER-K
Paper: https://trec.nist.gov/pubs/trec33/papers/IISER-K.tot.pdf
Runs: ThinkIR_BM25 | ThinIR_BM25_layer_2 | ThinkIR_semantic | ThinkIR_4_layer_2_w_small

Abstract

In this study, we explore various approaches for known-item retrieval, referred to as ``Tip-of-the-Tongue" (ToT). The TREC 2024 ToT track involves retrieving previously encountered items, such as movie names or landmarks when the searcher struggles to recall their exact identifiers. In this paper, we (ThinkIR) focus on four different approaches to retrieve the correct item for each query, including BM25 with optimized parameters and leveraging Large Language Models (LLMs) to reformulate the queries. Subsequently, we utilize these reformulated queries during retrieval using the BM25 model for each method. The four-step query reformulation technique, combined with two-layer retrieval, has enhanced retrieval performance in terms of NDCG and Recall. Eventually, two-layer retrieval achieves the best performance among all the runs, with a Recall@1000 of 0.8067.

Bibtex

@inproceedings{IISER-K-trec2024-papers-proc-1,
    author = {Subinay Adhikary (IISER-K),
Shuvam Banerji Seal (IISER-K),
Soumyadeep Sar (IISER-K),
Dwaipayan Roy (IISER-K)},
    title = {IISERK@ToT_2024:  Query Reformulation and Layered Retrieval for  Tip-of-Tongue Items},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {IISER-K},
    trec_runs = {ThinkIR_BM25, ThinIR_BM25_layer_2, ThinkIR_semantic, ThinkIR_4_layer_2_w_small},
    trec_tracks = {tot}
   url = {https://trec.nist.gov/pubs/trec33/papers/IISER-K.tot.pdf}
}

Yale NLP at TREC 2024: Tip-of-the-Tongue Track¶

Rohan Phanse (Yale University)Gabrielle Kaili-May Liu (Yale University)Arman Cohan (Yale University)

Participant: yalenlp
Paper: https://trec.nist.gov/pubs/trec33/papers/yalenlp.tot.pdf
Runs: dpr-lst-rerank | dpr-pnt-lst-rerank | dpr-router-lst-rerank

Abstract

This paper describes our submissions to the TREC 2024 Tip-of-the-Tongue (ToT) track. We use a two-stage pipeline consisting of DPR-based retrieval followed by reranking with GPT-4o mini to answer ToT queries across three domains: movies, celebrities, and landmarks. Two of our runs performed retrieval using a "general" DPR model trained to handle queries from all domains. For our third run, we developed an approach to route queries to multiple "expert" DPR models each trained on a single domain. To build training sets for our DPR models, we collected existing ToT queries and generated over 100k synthetic queries using few-shot prompting with LLMs. After retrieval, results were reranked either listwise or using a combined pointwise and listwise approach. Our results demonstrate the efficacy of our three submitted approaches, which achieved NDCG@1000 scores ranging from 0.51 to 0.60.

Bibtex

@inproceedings{yalenlp-trec2024-papers-proc-1,
    author = {Rohan Phanse (Yale University)
Gabrielle Kaili-May Liu (Yale University)
Arman Cohan (Yale University)},
    title = {Yale NLP at TREC 2024: Tip-of-the-Tongue Track},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {yalenlp},
    trec_runs = {dpr-lst-rerank, dpr-pnt-lst-rerank, dpr-router-lst-rerank},
    trec_tracks = {tot}
   url = {https://trec.nist.gov/pubs/trec33/papers/yalenlp.tot.pdf}
}

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Maik Fröbe (Friedrich-Schiller-Universität)Lukas Gienapp (Leipzig University ScaDS.AI)Harrisen Scells (Universität Kassel)Eric Oliver Schmidt (Martin-Luther-Universität Halle)Matti Wiegmann (Bauhaus-Universität Weimar)Martin PotthastUniversität Kassel (Universität Kassel hessian.AI ScaDS.AI)Matthias Hagen (Friedrich-Schiller-Universität Jena)

Participant: webis
Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Runs: webis-base | webis-tot-01 | webis-tot-02 | webis-tot-04 | webis-tot-03

Abstract

In this paper, we describe the Webis Group's participation in the 2024~edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year's submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    author = {Maik Fröbe (Friedrich-Schiller-Universität)
Lukas Gienapp (Leipzig University & ScaDS.AI)
Harrisen Scells (Universität Kassel)
Eric Oliver Schmidt (Martin-Luther-Universität Halle)
Matti Wiegmann (Bauhaus-Universität Weimar)
Martin Potthast
Universität Kassel (Universität Kassel & hessian.AI & ScaDS.AI)
Matthias Hagen (Friedrich-Schiller-Universität Jena)},
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {webis},
    trec_runs = {webis-01, webis-02, webis-03, webis-04, webis-05, webis-ag-run0-taskrag, webis-ag-run1-taskrag, webis-manual, webis-rag-run0-taskrag, webis-rag-run1-taskrag, webis-rag-run3-taskrag, webis-ag-run3-reuserag, webis-rag-run4-reuserag, webis-rag-run5-reuserag, webis-ag-run2-reuserag, webis-1, webis-2, webis-3, webis-gpt-1, webis-gpt-4, webis-gpt-6, webis-5, webis-base, webis-tot-01, webis-tot-02, webis-tot-04, webis-tot-03},
    trec_tracks = {biogen.rag.tot}
   url = {https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf}
}

Video-To-Text¶

softbank-meisei-trec2024-papers-proc-2¶

Aiswariya Manoj Kumar（Softbank Corp.）Hiroki Takushima（Softbank Corp.）Yuma Suzuki（Softbank Corp.）Hayato Tanoue（Softbank Corp.）Hiroki Nishihara（Softbank Corp.）Yuki Shibata（Softbank Corp.）Haruki Sato（Agoop Corp.）Takumi Takada（SB Intuitions Corp.）Takayuki Hori（Softbank Corp.）Kazuya Ueki（Meisei Univ.）

Participant: softbank-meisei
Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Runs: SoftbankMeisei_vtt_main_run1 | SoftbankMeisei_vtt_main_run2 | SoftbankMeisei_vtt_main_run3 | SoftbankMeisei_vtt_main_run4 | SoftbankMeisei_vtt_sub_run2 | SoftbankMeisei_vtt_sub_run3 | SoftbankMeisei_vtt_sub_run1

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    author = {Aiswariya Manoj Kumar（Softbank Corp.）
Hiroki Takushima（Softbank Corp.）
Yuma Suzuki（Softbank Corp.）
Hayato Tanoue（Softbank Corp.）
Hiroki Nishihara（Softbank Corp.）
Yuki Shibata（Softbank Corp.）
Haruki Sato（Agoop Corp.）
Takumi Takada（SB Intuitions Corp.）
Takayuki Hori（Softbank Corp.）
Kazuya Ueki（Meisei Univ.）},
    title = {softbank-meisei-trec2024-papers-proc-2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks¶

Kazuya Ueki (Meisei University)Yuma Suzuki (SoftBank Corp.)Hiroki Takushima (SoftBank Corp.)Haruki Sato (Agoop Corp.)Takumi Takada (SB Intuitions Corp.)Aiswariya Manoj Kumar (SoftBank Corp.)Hayato Tanoue (SoftBank Corp.)Hiroki Nishihara (SoftBank Corp.)Yuki Shibata (SoftBank Corp.)Takayuki Hori (SoftBank Corp.)

Participant: softbank-meisei
Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Runs: SoftbankMeisei_vtt_main_run1 | SoftbankMeisei_vtt_main_run2 | SoftbankMeisei_vtt_main_run3 | SoftbankMeisei_vtt_main_run4 | SoftbankMeisei_vtt_sub_run2 | SoftbankMeisei_vtt_sub_run3 | SoftbankMeisei_vtt_sub_run1

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    author = {Kazuya Ueki (Meisei University)
Yuma Suzuki (SoftBank Corp.)
Hiroki Takushima (SoftBank Corp.)
Haruki Sato (Agoop Corp.)
Takumi Takada (SB Intuitions Corp.)
Aiswariya Manoj Kumar (SoftBank Corp.)
Hayato Tanoue (SoftBank Corp.)
Hiroki Nishihara (SoftBank Corp.)
Yuki Shibata (SoftBank Corp.)
Takayuki Hori (SoftBank Corp.)},
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {avs.vtt}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf}
}