Proceedings 2024¶

Adhoc Video Search¶

TRECVID 2024 - Evaluating video search, captioning, and activity recognition¶

George Awad, Jonathan Fiscus, Afzal Godil, Lukas Diduch, Yvette Graham, Georges Quénot

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_avs.pdf

Abstract

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-6,
    title = {TRECVID 2024 - Evaluating video search, captioning, and activity recognition},
    author = {George Awad and Jonathan Fiscus and Afzal Godil and Lukas Diduch and Yvette Graham and Georges Quénot},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Softbank-Meisei at TREC 2024¶

Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar, Hayato Tanoue, Hiroki Nishihara, Yuki Shibata, Takayuki Hori

Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Participant: softbank-meisei
Runs: SoftbankMeisei - Main Run 1 | SoftbankMeisei - Main Run 2 | SoftbankMeisei - Main Run 3 | SoftbankMeisei - Main Run 4 | SoftbankMeisei - Progress Run 1 | SoftbankMeisei - Progress Run 2 | SoftbankMeisei - Progress Run 3 | SoftbankMeisei - Progress Run 4

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year’s AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year’s VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year’s test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    title = {Softbank-Meisei at TREC 2024},
    author = {Kazuya Ueki and Yuma Suzuki and Hiroki Takushima and Haruki Sato and Takumi Takada and Aiswariya Manoj Kumar and Hayato Tanoue and Hiroki Nishihara and Yuki Shibata and Takayuki Hori},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks¶

Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar, Hayato Tanoue, Hiroki Nishihara, Yuki Shibata, Takayuki Hori

Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Participant: softbank-meisei
Runs: SoftbankMeisei - Main Run 1 | SoftbankMeisei - Main Run 2 | SoftbankMeisei - Main Run 3 | SoftbankMeisei - Main Run 4 | SoftbankMeisei - Progress Run 1 | SoftbankMeisei - Progress Run 2 | SoftbankMeisei - Progress Run 3 | SoftbankMeisei - Progress Run 4

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year’s AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year’s VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year’s test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    author = {Kazuya Ueki and Yuma Suzuki and Hiroki Takushima and Haruki Sato and Takumi Takada and Aiswariya Manoj Kumar and Hayato Tanoue and Hiroki Nishihara and Yuki Shibata and Takayuki Hori},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

RUC_AIM3 at TRECVID 2024: Ad-hoc Video Search¶

Xueyan Wang, Yang Du, Yuqi Liu, Qin Jin

Paper: https://trec.nist.gov/pubs/trec33/papers/ruc_aim3.avs.pdf
Participant: ruc_aim3
Runs: add_captioning | baseline | add_QArerank | add_captioning_QArerank

Abstract

This report presents our solution for the Ad-hoc Video Search (AVS) task of TRECVID 2024. Based on our baseline AVS model in TRECVID 2023, we further improve the searching performance by integrating multiple visual-embedding models, performing video captioning to be used for topic-to-caption searches, and applying a re-ranking strategy for top candidate search selection. Our submissions from our improved AVS model rank the 3rd in TRECVID AVS 2024 on mean average precision (mAP) in the main task, achieving the best run of 36.8.

Bibtex

@inproceedings{ruc_aim3-trec2024-papers-proc-1,
    title = {RUC\_AIM3 at TRECVID 2024: Ad-hoc Video Search},
    author = {Xueyan Wang and Yang Du and Yuqi Liu and Qin Jin},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024¶

Konstantinos Gkountakos, Damianos Galanopoulos, Antonios Leventakis, Georgios Tsionkis, Klearchos Stavrothanasopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris

Paper: https://trec.nist.gov/pubs/trec33/papers/CERTH-ITI.avs.actev.pdf
Participant: CERTH-ITI
Runs: certh.iti.avs.24.main.run.1 | certh.iti.avs.24.main.run.2 | certh.iti.avs.24.main.run.3 | certh.iti.avs.24.progress.run.1 | certh.iti.avs.24.progress.run.2 | certh.iti.avs.24.progress.run.3

Abstract

This report presents the overview of the runs related to Ad-hoc Video Search (AVS) and Activities in Extended Video (ActEV) tasks on behalf of the ITI-CERTH team. Our participation in the AVS task involves a collection of five cross-modal deep network architectures and numerous pre-trained models, which are used to calculate the similarities between video shots and queries. These calculated similarities serve as input to a trainable neural network that effectively combines them. During the retrieval stage, we also introduce a normalization step that utilizes both the current and previous AVS queries for revising the combined video shot-query similarities. For the ActEV task, we adapt our framework to support a rule-based classification to overcome the challenges of detecting and recognizing activities in a multi-label manner while experimenting with two separate activity classifiers.

Bibtex

@inproceedings{CERTH-ITI-trec2024-papers-proc-1,
    title = {ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024},
    author = {Konstantinos Gkountakos and Damianos Galanopoulos and Antonios Leventakis and Georgios Tsionkis and Klearchos Stavrothanasopoulos and Konstantinos Ioannidis and Stefanos Vrochidis and Vasileios Mezaris and Ioannis Kompatsiaris},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK¶

Heng Liu, Jiangshan He, Zeyuan Zhang, Yuanyuan Xu, Chao Liang

Paper: https://trec.nist.gov/pubs/trec33/papers/WHU-NERCMS.avs.pdf
Participant: WHU-NERCMS
Runs: run4 | run3 | run2 | Manual_run1 | relevance_feedback_run4 | relevance_feedback_run1 | auto_run1 | rf_run2 | RF_run3

Abstract

The WHU-NERCMS team participated in the ad-hoc video search (AVS) task of TRECVID 2024. In this year’s AVS task, we continued to use multiple visual semantic embedding methods, combined with interactive feedback-guided ranking aggregation techniques to integrate different models and their outputs to generate the final ranked video shot list. We submitted 4 runs each for automatic and interactive tasks, along with one attempt for a manual assistance task. Table 1 shows our results for this year.

Bibtex

@inproceedings{WHU-NERCMS-trec2024-papers-proc-1,
    title = {WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK},
    author = {Heng Liu and Jiangshan He and Zeyuan Zhang and Yuanyuan Xu and Chao Liang},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Biomedical Generative Retrieval (BioGen) Track¶

Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track¶

Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, Kirk Roberts

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_biogen.pdf

Abstract

With the advancement of large language models (LLMs), the biomedical domain has seen significant progress and improvement in multiple tasks such as biomedical question answering, lay language summarization of the biomedical literature, clinical note summarization, etc. However, hallucinations or confabulations remain one of the key challenges when using LLMs in the biomedical and other domains.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-1,
    title = {Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track},
    author = {Deepak Gupta and Dina Demner-Fushman and William Hersh and Steven Bedrick and Kirk Roberts},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Lukas Gienapp, Maik Fröbe, Jan Heinrich Merker, Harrisen Scells, Eric Oliver Schmidt, Matti Wiegmann, Martin Potthast, Matthias Hagen

Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Participant: webis
Runs: webis-1 | webis-2 | webis-3 | webis-gpt-1 | webis-gpt-4 | webis-gpt-6 | webis-5

Abstract

In this paper, we describe the Webis Group’s participation in the 2024 edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year’s submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    author = {Lukas Gienapp and Maik Fröbe and Jan Heinrich Merker and Harrisen Scells and Eric Oliver Schmidt and Matti Wiegmann and Martin Potthast and Matthias Hagen},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Exploring the Few-Shot Performance of Low-Cost Proprietary Models in the 2024 TREC BioGen Track¶

Samy Ateia, Udo Kruschwitz

Paper: https://trec.nist.gov/pubs/trec33/papers/ur-iw.biogen.pdf
Participant: ur-iw
Runs: zero-shot-gpt4o-mini | zero-shot-gemini-flash | ten-shot-gpt4o-mini | ten-shot-gemini-flash | ten-shot-gpt4o-mini-wiki | ten-shot-gemini-flash-wiki

Abstract

For the 2024 TREC Biomedical Generative Retrieval (BioGen) Track, we evaluated proprietary low-cost large language models (LLMs) in few-shot and zero-shot settings for biomedical question answering. Building upon our prior competitive approach from the CLEF 2024 BioASQ challenge, we adapted our methods to the BioGen task. We reused few-shot examples from BioASQ and generated additional ones from the test set for the BioGen specific answer format, by using an LLM judge to select examples. Our approach involved query expansion, BM25-based retrieval using Elasticsearch, snippet extraction, reranking, and answer generation both with and without 10-shot learning and additional relevant context from Wikipedia. The results are in line with our findings at BioASQ, indicating that additional Wikipedia context did not improve the results, while 10-shot learning did. An interactive reference implementation that showcases Google’s Gemini-1.5-flash performance with 3-shot learning is available online and the source code of this demo is available on GitHub.

Bibtex

@inproceedings{ur-iw-trec2024-papers-proc-1,
    title = {Exploring the Few-Shot Performance of Low-Cost Proprietary Models in the 2024 TREC BioGen Track},
    author = {Samy Ateia and Udo Kruschwitz},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Interactive Knowledge Assistance¶

TREC iKAT 2024: The Interactive Knowledge Assistance Track Overview¶

Mohammad Aliannejadi, Zahra Abbasiantaeb, Simon Lupart, Shubham Chatterjee, Jeffrey Dalton, Leif Azzopardi

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_ikat.pdf

Abstract

Conversational information seeking has evolved rapidly in the last few years with the development of large language models (LLMs) providing the basis for interpreting and responding in a naturalistic manner to user requests. iKAT emphasizes the creation and research of conversational search agents that adapt responses based on the user’s prior interactions and present context, maintaining a long-term memory of user-system interactions. This means that the same question might yield varied answers, contingent on the user’s profile and preferences. The challenge lies in enabling conversational search agents (CSA) to incorporate personalized context to guide users through the relevant information effectively. iKAT’s second year attracted seven teams and a total of 31 runs. Most of the runs leveraged LLMs in their pipelines with some LLMs to do a single query rewrite, while others leveraged LLMs to do multiple query rewrites.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-4,
    title = {TREC iKAT 2024: The Interactive Knowledge Assistance Track Overview},
    author = {Mohammad Aliannejadi and Zahra Abbasiantaeb and Simon Lupart and Shubham Chatterjee and Jeffrey Dalton and Leif Azzopardi},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

IRLab@iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search¶

Simon Lupart, Zahra Abbasiantaeb, Mohammad Aliannejadi

Paper: https://trec.nist.gov/pubs/trec33/papers/uva.ikat.pdf
Participant: uva
Runs: manual-splade-fusion | manual-splade-debertav3 | gpt4-MQ-debertav3 | gpt4-mq-rr-fusion | gpt-single-QR-rr-debertav3 | qd1

Abstract

The Interactive Knowledge Assistant Track (iKAT) 2024 focuses on advancing conversational assistants, able to adapt their interaction and responses from personalized user knowledge. The track incorporates a Personal Textual Knowledge Base (PTKB) alongside Conversational AI tasks, such as passage ranking and response generation. Query Rewrite being an effective approach for resolving conversational context, we explore Large Language Models (LLMs), as query rewriters. Specifically, our submitted runs explore multi-aspect query generation using the MQ4CS framework, which we further enhance with Learned Sparse Retrieval via the SPLADE architecture, coupled with robust cross-encoder models. We also propose an alternative to the previous interleaving strategy, aggregating multiple aspects during the reranking phase. Our findings indicate that multi-aspect query generation is effective in enhancing performance when integrated with advanced retrieval and reranking models. Our results also lead the way for better personalization in Conversational Search, relying on LLMs to integrate personalization within query rewrite, and outperforming human rewrite performance.

Bibtex

@inproceedings{uva-trec2024-papers-proc-1,
    title = {IRLab@iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search},
    author = {Simon Lupart and Zahra Abbasiantaeb and Mohammad Aliannejadi},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

DCU-ADAPT@TREC iKAT 2024: Incorporating Retrieved Knowledge for Enhanced Conversational Search¶

Praveen Acharya, Xiao Fu, Noriko Kando, Gareth J. F. Jones

Paper: https://trec.nist.gov/pubs/trec33/papers/DCU-ADAPT.ikat.pdf
Participant: DCU-ADAPT
Runs: dcu_manual_qe_summ_TopP_3 | dcu_manual_qe_summ_ptkb_TopP_3 | dcu_auto_qe_key_topP-50_topK-5 | dcu_auto_qre_sim | dcu_auto_qe_summ_TopP_3 | dcu_auto_qe_summ_ptkb_TopP_

Abstract

Users of search applications often encounter difficulties in expressing their information needs effectively. Conversational search (CS) can potentially support users in creating effective queries by enabling a multi-turn, iterative dialogue between a User and the search Systems. These dialogues help users to refine and build their understanding of their information need through a series of query-response exchanges. However, current CS systems generally do not accumulate knowledge about the user’s information needs or the content with which they have engaged during this dialogue. This limitation can hinder the system’s ability to support users effectively. To address this issue, we propose an approach that seeks to model and utilize knowledge gained from each interaction to enhance future user queries. Our method focuses on incorporating knowledge from retrieved documents to enrich subsequent user queries, ultimately improving query comprehension and retrieval outcomes. We test the effectiveness of our proposed approach in our TREC iKAT 2024 participation.

Bibtex

@inproceedings{DCU-ADAPT-trec2024-papers-proc-1,
    title = {DCU-ADAPT@TREC iKAT 2024: Incorporating Retrieved Knowledge for Enhanced Conversational Search},
    author = {Praveen Acharya and Xiao Fu and Noriko Kando and Gareth J. F. Jones},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

IIUoT at TREC 2024 Interactive Knowledge Assistance Track¶

YATING ZHANG, HAITAO YU

Paper: https://trec.nist.gov/pubs/trec33/papers/ii_research.ikat.pdf
Participant: ii_research
Runs: iiresearch_ikat2024_rag_top5_bge_reranker | iiresearch_ikat2024_rag_top5_monot5_reranker

Abstract

In conversational information-seeking (CIS), the ability to tailor responses to individual user contexts is essential for enhancing relevance and accuracy. The TREC Interactive Knowledge Assistance Track addresses this need by advancing research in personalized conversational agents that adapt dynamically to user-specific details and preferences. Our study aligns with this framework, which involves three core tasks: personal textual knowledge base (PTKB) statement ranking, passage ranking, and response generation. To address these tasks, we propose a comprehensive framework that incorporates user context at each stage. For PTKB statement ranking, we integrate embedding models with large language models (LLMs) to optimize relevance-based ranking precision, allowing for more nuanced alignment of user characteristics with retrieved information. In the passage ranking stage, our adaptive retrieval strategy combines BM25 with iterative contextual refinement, enhancing the relevance and accuracy of retrieved passages. Finally, our response generation module leverages a Retrieval-Augmented Generation (RAG) model that dynamically synthesizes user-specific context and external knowledge, producing responses that are both precise and contextually relevant. Experimental results demonstrate that our framework effectively addresses the complexities of personalized CIS, achieving notable improvements over traditional static retrieval methods.

Bibtex

@inproceedings{ii_research-trec2024-papers-proc-1,
    title = {IIUoT at TREC 2024 Interactive Knowledge Assistance Track},
    author = {YATING ZHANG and HAITAO YU},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

RALI@TREC iKAT 2024: Achieving Personalization via Retrieval Fusion in Conversational Search¶

Yuchen Hui, Fengran Mo, Milan Mao, Jian-Yun Nie

Paper: https://trec.nist.gov/pubs/trec33/papers/rali lab.ikat.pdf
Participant: rali lab
Runs: RALI_manual_monot5 | RALI_manual_rankllama | RALI_gpt4o_fusion_rerank | RALI_gpt4o_no_personalize_fusion_rerank | RALI_gpt4o_no_personalize_fusion_norerank | RALI_gpt4o_fusion_norerank

Abstract

The Recherche Appliquée en Linguistique Informatique (RALI) team participated in the 2024 TREC Interactive Knowledge Assistance (iKAT) Track. In personalized conversational search, effectively capturing a user’s complex search intent requires incorporating both contextual information and key elements from the user profile into query reformulation. The user profile often contains many relevant pieces, and each could potentially complement the user’s information needs. It is difficult to disregard any of them, whereas introducing an excessive number of these pieces risks drifting from the original query and hinders search performance. This is a challenge we denote as over-personalization. In this paper, we tackle the problem via employing different strategies based on fusing ranking lists generated from the queries with different levels of personalization.

Bibtex

@inproceedings{rali lab-trec2024-papers-proc-1,
    title = {RALI@TREC iKAT 2024: Achieving Personalization via Retrieval Fusion in Conversational Search},
    author = {Yuchen Hui and Fengran Mo and Milan Mao and Jian-Yun Nie},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

NII@TREC IKAT 2024:LLM-Based Pipelines for Personalized Conversational Information Seeking¶

Noriko Kando, Praveen Acharya, Navdeep Singh Bedi, Xiao Fu

Paper: https://trec.nist.gov/pubs/trec33/papers/nii.ikat.pdf
Participant: nii
Runs: nii_res_gen | nii_auto_base | nii_manu_base | nii_auto_ptkb_rr | nii_manu_ptkb_rr | NII_automatic_GeRe

Abstract

In this paper, we propose two novel pipelines—Retrieve-then-Generate (RtG) and Generate-then-Retrieve (GtR)—to enhance conversational information seeking (CIS) systems, evaluated within the TREC iKAT 2023 framework. The RtG pipeline emphasizes brevity in rewriting user utterances and generates multiple query groups to maximize the retrieval of relevant documents. This approach leads to improved recall in the final results compared to the best submission in 2023. Additionally, it incorporates a chain-of-thought methodology through a two-stage response generation process. In a zero-shot setting, the GtR pipeline introduces a hybrid approach by ensembling state-of-the-art Large Language Models (LLMs), specifically GPT-4o and Claude-3-opus. By leveraging the strengths of multiple LLMs, the GtR pipeline achieves high recall while maintaining competitive precision and ranking performance in both document retrieval and Personal Task Knowledge Base (PTKB) statement classification tasks. Our experimental results demonstrate that both pipelines significantly enhance retrieval effectiveness, offering robust solutions for future CIS systems.

Bibtex

@inproceedings{nii-trec2024-papers-proc-1,
    title = {NII@TREC IKAT 2024:LLM-Based Pipelines for Personalized Conversational Information Seeking},
    author = {Noriko Kando and Praveen Acharya and Navdeep Singh Bedi and Xiao Fu},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Passage Query Methods for Retrieval and Reranking in Conversational Agents¶

Victor De Lima, Grace Hui Yang

Paper: https://trec.nist.gov/pubs/trec33/papers/infosenselab.ikat.pdf
Participant: infosenselab
Runs: infosense_llama_pssgqrs_wghtdrerank_2 | infosense_llama_pssgqrs_wghtdrerank_1 | infosense_llama_short_long_qrs_2 | infosense_llama_short_long_qrs_3

Abstract

This paper presents our approach to the TREC Interactive Knowledge Assistance Track (iKAT), which focuses on improving conversational information-seeking (CIS) systems. While recent advancements in CIS have improved conversational agents' ability to assist users, significant challenges remain in understanding context and retrieving relevant documents across domains and dialogue turns. To address these issues, we extend the Generate-Retrieve-Generate pipeline by developing passage queries (PQs) that align with the target document's expected format to improve query-document matching during retrieval. We propose two variations of this approach: Weighted Reranking and Short and Long Passages. Each method leverages a Meta Llama model for context understanding and generating queries and responses. Passage ranking evaluation results show that the Short and Long Passages approach outperformed the organizers' baselines, performed best among Llama-based systems in the track, and achieved results comparable to GPT-4-based systems. These results indicate that the method effectively balances efficiency and performance. Findings suggest that PQs improve semantic alignment with target documents and demonstrate their potential to improve multi-turn dialogue systems.

Bibtex

@inproceedings{infosenselab-trec2024-papers-proc-1,
    title = {Passage Query Methods for Retrieval and Reranking in Conversational Agents},
    author = {Victor De Lima and Grace Hui Yang},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Lateral Reading¶

Overview of the TREC 2024 Lateral Reading Track¶

Dake Zhang, Mark D. Smucker, Charles L. A. Clarke

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_lateral.pdf

Abstract

The current web landscape, characterized by abundant information and widespread misinformation, highlights the pressing need for people to evaluate the trustworthiness of online content effectively. However, this remains a daunting challenge for many internet users. The TREC 2024 Lateral Reading Track seeks to address this issue by supporting the use of lateral reading, a proven strategy used by professional fact-checkers, to help users evaluate news articles more effectively and efficiently. In its first year, the track had two tasks: (1) generating questions that readers should consider when assessing the trustworthiness of the given news articles, and (2) retrieving documents to help answer these questions. This paper presents an overview of the track, including its objectives, methodologies, resources, and evaluation results. Our evaluation of the submitted runs shows the significant challenges these tasks pose to existing approaches, including state-of-the-art large language models. Further details on this track can be found on its website: https://trec-dragun.github.io/.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-5,
    title = {Overview of the TREC 2024 Lateral Reading Track},
    author = {Dake Zhang and Mark D. Smucker and Charles L. A. Clarke},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Monster Ranking¶

Charles L. A. Clarke, Siqing Huo, Negar Arabzadeh

Paper: https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf
Participant: WaterlooClarke
Runs: uwclarke_auto | uwclarke_auto_summarized | UWClarke_rerank

Bibtex

@inproceedings{WaterlooClarke-trec2024-papers-proc-1,
    title = {Monster Ranking},
    author = {Charles L. A. Clarke and Siqing Huo and Negar Arabzadeh},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Medical Video Question Answering¶

Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track¶

Deepak Gupta, Dina Demner-Fushman

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_medvidqa.pdf

Abstract

One of the key goals of artificial intelligence (AI) is the development of a multimodal system that facilitates communication with the visual world (image and video) using a natural language query. Earlier works on medical question answering primarily focused on textual and visual (image) modalities, which may be inefficient in answering questions requiring demonstration. In recent years, significant progress has been achieved due to the introduction of large-scale language-vision datasets and the development of efficient deep neural techniques that bridge the gap between language and visual understanding.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-2,
    title = {Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track},
    author = {Deepak Gupta and Dina Demner-Fushman},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Doshisha University, Universität zu Lübeck and German Research Center for Artificial Intelligence at TRECVID 2024: QFISC Task¶

Zihao Chen, Falco Lentzsch, Nele S. Brügge, Frédéric Li, Miho Ohsaki, Heinz Handels, Marcin Grzegorzek, Kimiaki Shirahama

Paper: https://trec.nist.gov/pubs/trec33/papers/DoshishaUzlDfki.medvidqa.pdf
Participant: DoshishaUzlDfki
Runs: chatGPT_zeroshot_prompt | mistral_meta_prompt | mistral_fewshot_prompt | GPT_meta_prompt | CoSeg_meta_prompt

Abstract

This paper presents the approaches proposed by the DoshishaUzlDfki team to address the Query-Focused Instructional Step Captioning (QFISC) task of TRECVID 2024. Given some RGB videos containing stepwise instructions, we explored several techniques to automatically identify the boundaries of each step, and provide a caption to it. More specifically, two different types of methods were investigated for temporal video segmentation. The first uses the CoSeg approach proposed by Wang et al. [9] based on Event Segmentation Theory, which hypothesises that video frames at the boundaries of steps are harder to predict since they tend to contain more significant visual changes. In detail, CoSeg detects event boundaries in the RGB video stream by finding the local maxima in the reconstruction error of a model trained to reconstruct the temporal contrastive embeddings of video snippets. The second type of approaches we tested exclusively relies on the audio modality, and is based on the hypothesis that information about step transitions is often semantically contained in the verbal transcripts of the videos. In detail, we used the WhisperX model [3] that isolates speech parts in the audio tracks of the videos, and converts them into timestamped text transcripts. The latter were then sent as input of a Large Language Model (LLM) with a carefully designed prompt requesting the LLM to identify step boundaries. Once the temporal video segmentation performed, we sent the WhisperX transcripts corresponding to the video segments determined by both methods to a LLM instructed to caption them. The GPT4o and Mistral Large 2 LLMs were employed in our experiments for both segmentation and captioning. Our results show that the temporal segmentation methods based on audio-processing significantly outperform the video-based one. More specifically, the best performances we obtained are yielded by our approach using GPT4o with zero-shot prompting for temporal segmentation. It achieves the top global performances of all runs submitted to the QFISC task in all evaluation metrics, except for precision whose best performance is obtained by our run using Mistral Large 2 with chain-of-thoughts prompting.

Bibtex

@inproceedings{DoshishaUzlDfki-trec2024-papers-proc-1,
    title = {Doshisha University, Universität zu Lübeck and German Research Center for Artificial Intelligence at TRECVID 2024: QFISC Task},
    author = {Zihao Chen and Falco Lentzsch and Nele S. Brügge and Frédéric Li and Miho Ohsaki and Heinz Handels and Marcin Grzegorzek and Kimiaki Shirahama},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

NeuCLIR¶

IRLab-AMS at TREC’24 NeuCLIR Track¶

Andrew Yates, Jia-Huei Ju

Abstract

In this notebook paper, we describe our participation as IRLab-AMS in the NeuCLIR. Our submitted results for two tasks, multi-lingual information retrieval (MLIR) and cross-language report generation (ReportGen). For MLIR, we explore the learned sparse representations with multi-lingual retrieval settings. For ReportGen, we experiment with several pipelines for generating long-form reports, including standard retrieval-augmented generation (RAG) and post-hoc citation methods. Additionally, we add an extra retrieval augmentation module to handle the limitation of ad-hoc retriever. The module can serve as distinct purposes, including relevance ranking, novelty ranking, and summarization, or by combining them.

Bibtex

@inproceedings{IRLab-Amsterdam-trec2024-papers-proc-1,
    title = {IRLab-AMS at TREC’24 NeuCLIR Track},
    author = {Andrew Yates and Jia-Huei Ju},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Overview of the TREC 2024 NeuCLIR Track¶

Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

Abstract

The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the effect of neural approaches on cross-language information access. The track has created test collections containing Chinese, Persian, and Russian news stories and Chinese academic abstracts. NeuCLIR includes four task types: Cross-Language Information Retrieval (CLIR) from news, Multilingual Information Retrieval (MLIR) from news, Report Generation from news, and CLIR from technical documents. A total of 274 runs were submitted by five participating teams (and as baselines by the track coordinators) for eight tasks across these four task types. Task descriptions and the available results are presented.

Bibtex

@inproceedings{hltcoe-trec2024-papers-proc-2,
    title = {Overview of the TREC 2024 NeuCLIR Track},
    author = {Dawn Lawrie and Sean MacAvaney and James Mayfield and Paul McNamee and Douglas W. Oard and Luca Soldaini and Eugene Yang},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

HLTCOE at TREC 2024 NeuCLIR Track¶

Eugene Yang, Dawn Lawrie, Orion Weller, James Mayfield

Abstract

The HLTCOE team applied PLAID, an mT5 reranker, GPT-4 reranker, score fusion, and document translation to the TREC 2024 NeuCLIR track. For PLAID we included a variety of models and training techniques – Translate Distill (TD), Generate Distill (GD) and multi-lingual translate-distill (MTD). TD uses scores from the mT5 model over English MS MARCO query-document pairs to learn how to score query-document pairs where the documents are translated to match the CLIR setting. GD follows TD but uses passages from the collection and queries generated by an LLM for training examples. MTD uses MS MARCO translated into multiple languages, allowing experiments on how to batch the data during training. Finally, for report generation we experimented with system combination over different runs. One family of systems used either GPT-4o or Claude-3.5-Sonnet to summarize the retrieved results from a series of decomposed sub-questions. Another system took the output from those two models and verified/combined them with Claude-3.5-Sonnet. The other family used GPT4o and GPT3.5Turbo to extract and group relevant facts from the retrieved documents based on the decomposed queries. The resulting submissions directly concatenate the grouped facts to form the report and their documents of origin as the citations. The team submitted runs to all NeuCLIR tasks: CLIR and MLIR news tasks as well as the technical documents task and the report generation task.

Bibtex

@inproceedings{hltcoe-trec2024-papers-proc-1,
    title = {HLTCOE at TREC 2024 NeuCLIR Track},
    author = {Eugene Yang and Dawn Lawrie and Orion Weller and James Mayfield},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Plain-Language Adaptation of Biomedical Abstracts¶

SIB Text-Mining at TREC PLABA 2024¶

Luc Mottin, AnaÃs Mottaz, Julien Knafou, Alexandre Flament, Julien Gobeill, Patrick Ruch

Paper: https://trec.nist.gov/pubs/trec33/papers/SIB.plaba.pdf
Participant: SIB
Runs: TREC2024_SIB_run1 | TREC2024_SIB_run3 | TREC2024_SIB_run4

Abstract

The comprehension of health information by patients has a real influence on the efficacy of their treatment. However, while more health resources are increasingly available to the public, the use of medical jargon and complex syntax makes them difficult to understand. Recent advances in machine translation and text simplification may help to make these resources more accessible by adapting biomedical text into plain language. In this context, the TREC 2024 Plain Language Adaptation of Biomedical Abstracts track sought to develop specialized algorithms able to adapt biomedical abstracts into plain language for the general public. The SIB Text Mining group participated in the “Complete Abstract Adaptation” subtask. Our first approach examines how a specific prompting using a state-of-the-art Large Language Model performs in the global adaptation of biomedical documents, with the intention of proposing a baseline with no technical improvements to compare more advanced strategies. The second approach investigates how the fine tuning of the transformer handles the task, and the third approach integrates a Retrieval Augmented Generation function to help generate a new document based on information from relevant sources.

Bibtex

@inproceedings{SIB-trec2024-papers-proc-7,
    title = {SIB Text-Mining at TREC PLABA 2024},
    author = {Luc Mottin and AnaÃs Mottaz and Julien Knafou and Alexandre Flament and Julien Gobeill and Patrick Ruch},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries¶

Jan Bakker, Taiki Papandreou-Lazos, Jaap Kamps

Paper: https://trec.nist.gov/pubs/trec33/papers/UAmsterdam.plaba.pdf
Participant: UAmsterdam
Runs: UAms-ConBART-Cochrane | UAms-BART-Cochrane

Abstract

This paper documents the University of Amsterdam’s participation in the TREC 2024 Plain Language Adaptation of Biomedical Abstracts (PLABA) Track. We investigated the effectiveness of text simplification models trained on aligned pairs of sentences in biomedical abstracts and plain language summaries. We participated in Task 2 on Complete Abstract Adaptation and conducted post-submission experiments in Task 1 on Term Replacement.

Bibtex

@inproceedings{UAmsterdam-trec2024-papers-proc-2,
    title = {Biomedical Text Simplification Models Trained on Aligned Abstracts and Lay Summaries},
    author = {Jan Bakker and Taiki Papandreou-Lazos and Jaap Kamps},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

MALEI at the PLABA Track of TREC 2024: RoBERTa for Term Replacement – LLaMA3.1 and GPT-4o for Complete Abstract Adaptation¶

Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

Paper: https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf
Participant: UM
Runs: Roberta-base | GPT | LLaMa 3.1 70B instruction (2nd run)

Abstract

Health literacy, or the ability of individuals to comprehend and apply health information for informed decision-making, is one of the central focuses of the Healthy People 2030 framework in the US. Even though biomedical information is highly accessible online, patients and caregivers often struggle with language barriers, even when the content is presented in their native language. The shared task PLABA aims to harness advances in deep learning to empower the automatic simplification of complex scientific texts into language that is more understandable for patients and caregivers. Despite substantial obstacles to effective implementation, the goal of the PLABA track is to improve health literacy by translating biomedical abstracts into plain language, making them more accessible and understandable to the general public 1. Following our participation on the PLABA-2023 shared task using large language models (LLMs) such as ChatGPT, BioGPT, and Flan-T5, and Control Mechanisms (Li et al., 2024), in this work, we introduce our system participation to the PLABA-2024. Instead of end-to-end biomedical abstract simplification as in PLABA-2023, in this year, PLABA-2024 introduced more granular-steps, including Term Replacement for Task-1 and Complete Abstract Adaption for Task-2, which we will describe in detail for our methodologies via fine-tuning RoBERTa-Base model for Task-1 and prompting LLMs (LLaMa-3.1-70B and GPT4o) for Task-2.

Bibtex

@inproceedings{UM-trec2024-papers-proc-3,
    title = {MALEI at the PLABA Track of TREC 2024: RoBERTa for Term Replacement – LLaMA3.1 and GPT-4o for Complete Abstract Adaptation},
    author = {Zhidong Ling and Zihao Li and Pablo Romero and Lifeng Han and Goran Nenadic},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text¶

Primoz Kocbek, Leon Kopitar, Zhihong Zhang, Emirhan Aydın, Maxim Topaz, Gregor Stiglic

Paper: https://trec.nist.gov/pubs/trec33/papers/um_fhs.plaba.pdf
Participant: um_fhs
Runs: plaba_um_fhs_sub1 | plaba_um_fhs_sub2 | plaba_um_fhs_sub3

Abstract

This paper describes our submissions to the TREC 2024 PLABA track with the aim to simplify biomedical abstracts for a K8-level audience (13–14 years old students). We tested three approaches using OpenAI’s gpt-4o and gpt-4o-mini models: baseline prompt engineering, a two-AI agent approach, and fine-tuning. Adaptations were evaluated using qualitative metrics (5-point Likert scales for simplicity, accuracy, completeness, and brevity) and quantitative readability scores (Flesch-Kincaid grade level, SMOG Index). Results indicated that the two-agent approach and baseline prompt engineering with gpt-4o-mini models showed superior qualitative performance, while fine-tuned models excelled in accuracy and completeness but were less simple.

Bibtex

@inproceedings{um_fhs-trec2024-papers-proc-1,
    title = {UM\_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text},
    author = {Primoz Kocbek and Leon Kopitar and Zhihong Zhang and Emirhan Aydın and Maxim Topaz and Gregor Stiglic},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

MALEI at the PLABA Track of TAC-2024: RoBERTa for Task 1 – LLaMA3.1 and GPT-4o for Task 2¶

Zhidong Ling, Zihao Li, Pablo Romero, Lifeng Han, Goran Nenadic

Paper: https://trec.nist.gov/pubs/trec33/papers/UM.plaba.pdf
Participant: UM
Runs: Roberta-base | GPT | LLaMa 3.1 70B instruction (2nd run)

Abstract

Health literacy, or the ability of individuals to comprehend and apply health information for informed decision-making, is one of the central focuses of the Healthy People 2030 framework in the US. Even though biomedical information is highly accessible online, patients and caregivers often struggle with language barriers, even when the content is presented in their native language. The shared task PLABA aims to harness advances in deep learning to empower the automatic simplification of complex scientific texts into language that is more understandable for patients and caregivers. Despite substantial obstacles to effective implementation, the goal of the PLABA track is to improve health literacy by translating biomedical abstracts into plain language, making them more accessible and understandable to the general public 1. Following our participation on the PLABA-2023 shared task using large language models (LLMs) such as ChatGPT, BioGPT, and Flan-T5, and Control Mechanisms (Li et al., 2024), in this work, we introduce our system participation to the PLABA-2024. Instead of end-to-end biomedical abstract simplification as in PLABA-2023, in this year, PLABA-2024 introduced more granular-steps, including Term Replacement for Task-1 and Complete Abstract Adaption for Task-2, which we will describe in detail for our methodologies via fine-tuning RoBERTa-Base model for Task-1 and prompting LLMs (LLaMa-3.1-70B and GPT4o) for Task-2.

Bibtex

@inproceedings{UM-trec2024-papers-proc-2,
    title = {MALEI at the PLABA Track of TAC-2024: RoBERTa for Task 1 – LLaMA3.1 and GPT-4o for Task 2},
    author = {Zhidong Ling and Zihao Li and Pablo Romero and Lifeng Han and Goran Nenadic},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Enhancing Accessibility of Medical Texts through Large Language Model-Driven Plain Language Adaptation¶

Ting-Wei Chang, Hen-Hsen Huang, Hsin-Hsi Chen

Paper: https://trec.nist.gov/pubs/trec33/papers/ntu_nlp.plaba.pdf
Participant: ntu_nlp
Runs: gemini-1.5-pro_demon5_replace-demon5 | gemini-1.5-flash_demon5_replace-demon5 | gpt-4o-mini _demon5_replace-demon5 | task2_moa_tier3_post | task2_moa_tier1_post | task2_moa_tier2_post

Abstract

This paper addresses the challenge of making complex healthcare information more accessible through automated Plain Language Adaptation (PLA). PLA aims to simplify technical medical language, bridging a critical gap between the complexity of healthcare texts and patients’ reading comprehension. Recent advances in Large Language Models (LLMs), such as GPT and BART, have opened new possibilities for PLA, especially in zero-shot and few-shot learning contexts where task-specific data is limited. In this work, we leverage the capabilities of LLMs such as GPT-4o-mini, Gemini-1.5-pro, and LLaMA for text simplification. Additionally, we incorporate Mixture-of-Agents (MoA) techniques to enhance adaptability and robustness in PLA tasks. Key contributions include a comparative analysis of prompting strategies, finetuning with QLoRA on different LLMs, and the integration of MoA technique. Our findings demonstrate the effectiveness of LLM-driven PLA, showcasing its potential in making healthcare information more comprehensible while preserving essential content.

Bibtex

@inproceedings{ntu_nlp-trec2024-papers-proc-1,
    title = {Enhancing Accessibility of Medical Texts through Large Language Model-Driven Plain Language Adaptation},
    author = {Ting-Wei Chang and Hen-Hsen Huang and Hsin-Hsi Chen},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Product Search¶

JBNU at TREC 2024 Product Search Track¶

Gi-taek An, Seong-Hyuk Yim, Jun-Yong Park, Woo-Seok Choi, Kyung-Soon Lee

Paper: https://trec.nist.gov/pubs/trec33/papers/jbnu.product.pdf
Participant: jbnu
Runs: jbnu08 | jbnu04 | jbnu09 | jbnu01 | jbnu07 | jbnu10 | jbnu03 | jbnu02 | jbnu11 | jbnu12 | jbnu05 | jbnu06

Abstract

This paper describes the participation of the jbnu team in the TREC 2024 Product Search Track. This study addresses two key challenges in product search related to sparse and dense retrieval models. For sparse retrieval models, we propose modifying the activation function to GELU to filter out products that, despite being retrieved due to token expansion, are irrelevant for recommendation based on the scoring mechanism. For dense retrieval models, product search document indexing data was generated using the generative model T5 to address input token limitations. Experimental results demonstrate that both proposed methods yield performance improvements over baseline models.

Bibtex

@inproceedings{jbnu-trec2024-papers-proc-1,
    title = {JBNU at TREC 2024 Product Search Track},
    author = {Gi-taek An and Seong-Hyuk Yim and Jun-Yong Park and Woo-Seok Choi and Kyung-Soon Lee},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Retrieval-Augmented Generation¶

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Lukas Gienapp, Maik Fröbe, Jan Heinrich Merker, Harrisen Scells, Eric Oliver Schmidt, Matti Wiegmann, Martin Potthast, Matthias Hagen

Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Participant: webis
Runs: webis-01 | webis-02 | webis-03 | webis-04 | webis-05 | webis-ag-run0-taskrag | webis-ag-run1-taskrag | webis-ag-run3-reuserag | webis-ag-run2-reuserag | webis-manual | webis-rag-run0-taskrag | webis-rag-run1-taskrag | webis-rag-run3-taskrag | webis-rag-run4-reuserag | webis-rag-run5-reuserag

Abstract

In this paper, we describe the Webis Group’s participation in the 2024 edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year’s submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    author = {Lukas Gienapp and Maik Fröbe and Jan Heinrich Merker and Harrisen Scells and Eric Oliver Schmidt and Matti Wiegmann and Martin Potthast and Matthias Hagen},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

The University of Stavanger (IAI) at the TREC 2024 Retrieval-Augmented Generation Track¶

Weronika Lajewska, Krisztian Balog

Paper: https://trec.nist.gov/pubs/trec33/papers/uis-iai.rag.pdf
Participant: uis-iai
Runs: ginger_top_5 | baseline_top_5 | ginger-fluency_top_5 | ginger-fluency_top_10 | ginger-fluency_top_20

Abstract

This paper describes the participation of the IAI group at the University of Stavanger in the TREC 2024 Retrieval-Augmented Generation track. We employ a modular pipeline for Grounded Information Nugget-based GEneration of Conversational Information-Seeking Responses (GINGER) to ensure factual correctness and source attribution. The multistage process includes detecting, clustering, and ranking information nuggets, summarizing top clusters, and generating follow-up questions based on uncovered subspaces of relevant information. In our runs, we experiment with different length of the responses and different number of input passages. Preliminary results indicate that ours was one of the top performing systems in the augmented generation task.

Bibtex

@inproceedings{uis-iai-trec2024-papers-proc-1,
    title = {The University of Stavanger (IAI) at the TREC 2024 Retrieval-Augmented Generation Track},
    author = {Weronika Lajewska and Krisztian Balog},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Laboratory for Analytic Sciences in TREC 2024 Retrieval Augmented Generation Track¶

Yue Wang, John M. Conroy, Neil Molino, Julia Yang, Mike Green

Paper: https://trec.nist.gov/pubs/trec33/papers/ncsu-las.rag.pdf
Participant: ncsu-las
Runs: LAS_ENN_T5_RERANKED_MXBAI | LAS-splade-mxbai-rrf | LAS-splade-mxbai | LAS_enn_t5 | LAS_ann_t5_qdrant | LAS-splade-mxbai-rrf-mmr8 | LAS-splade-mxbai-mmr8-RAG | LAS-T5-mxbai-mmr8-RAG | LAS-splade-mxbai-rrf-mmr8-doc | LAS_splad_mxbai-rrf-occams_50_RAG

Abstract

We report on our approach to the NIST TREC 2024 retrieval-augmented generation (RAG) track. The goal of this track was to build and evaluate systems that can answer complex questions by 1) retrieving excerpts of webpages from a large text collection (hundreds of millions of excerpts taken from tens of millions of webpages); 2) summarizing relevant information within retrieved excerpts into an answer containing up to 400 words; 3) attributing each sentence in the generated summary to one or more retrieved excerpts. We participated in the retrieval (R) task and retrieval augmented generation (RAG) task.

Bibtex

@inproceedings{ncsu-las-trec2024-papers-proc-1,
    title = {Laboratory for Analytic Sciences in TREC 2024 Retrieval Augmented Generation Track},
    author = {Yue Wang and John M. Conroy and Neil Molino and Julia Yang and Mike Green},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Monster Ranking¶

Charles L. A. Clarke, Siqing Huo, Negar Arabzadeh

Paper: https://trec.nist.gov/pubs/trec33/papers/WaterlooClarke.lateral.rag.pdf
Participant: WaterlooClarke
Runs: monster | uwc1 | uwc2 | uwc0 | uwcCQAR | uwcCQA | uwcCQR | uwcCQ | uwcBA | uwcBQ | UWCrag | UWCrag_stepbystep | UWCgarag

Bibtex

@inproceedings{WaterlooClarke-trec2024-papers-proc-1,
    title = {Monster Ranking},
    author = {Charles L. A. Clarke and Siqing Huo and Negar Arabzadeh},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

CIR at TREC 2024 RAG: Task 2 - Augmented Generation with Diversified Segments and Knowledge Adaption¶

Jüri Keller, Björn Engelmann, Fabian Haak, Philipp Schaer, Hermann Kroll, Christin Katharina Kreutz

Abstract

This paper describes the CIR team’s participation in the TREC 2024 RAG track for task 2, augmented generation. With our approach, we intended to explore the effects of diversification of the segments that are considered in the generation as well as variations in the depths of users’ knowledge on a query topic. We describe a two-step approach that first reranks input segments such that they are as similar as possible to a query while also being as dissimilar as possible from higher ranked relevant segments. In the second step, these reranked segments are relayed to an LLM, which uses them to generate an answer to the query while referencing the segments that have contributed to specific parts of the answer. The LLM considers the varying background knowledge of potential users through our prompts.

Bibtex

@inproceedings{CIR-trec2024-papers-proc-1,
    title = {CIR at TREC 2024 RAG: Task 2 - Augmented Generation with Diversified Segments and Knowledge Adaption},
    author = {Jüri Keller and Björn Engelmann and Fabian Haak and Philipp Schaer and Hermann Kroll and Christin Katharina Kreutz},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

TREMA-UNH at TREC: RAG Systems and RUBRIC-style Evaluation¶

Naghmeh Farzi, Laura Dietz

Paper: https://trec.nist.gov/pubs/trec33/papers/TREMA-UNH.rag.pdf
Participant: TREMA-UNH
Runs: Ranked_Iterative_Fact_Extraction_and_Refinement | Enhanced_Iterative_Fact_Refinement_and_Prioritization | Ranked_Iterative_Fact_Extraction_and_Refinement_RIFER_-_bm25

Abstract

The TREMA-UNH team participated in the TREC Retrieval-Augmented Generation track (RAG). In Part 1 we describe the RAG systems submitted to the Augmented Generation Task (AG) and the Retrieval-Augmented Generation Task (RAG), the latter using a BM25 retrieval model. In Part 2 we describe an alternative LLM-based evaluation method for this track using the RUBRIC Autograder Workbench approach, which won the SIGIR’24 best paper award.

Bibtex

@inproceedings{TREMA-UNH-trec2024-papers-proc-1,
    title = {TREMA-UNH at TREC: RAG Systems and RUBRIC-style Evaluation},
    author = {Naghmeh Farzi and Laura Dietz},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Tip-of-the-Tongue¶

OVERVIEW OF THE TREC 2024 TIP-OF-THE-TONGUE TRACK¶

Jaime Arguello, Samarth Bhargav, Fernando Diaz, To Eun Kim, Yifan He, Evangelos Kanoulas, Bhaskar Mitra

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_tot.pdf

Abstract

Tip-of-the-tongue (ToT) known-item retrieval involves re-finding an item for which the searcher does not reliably recall an identifier. ToT information requests (or queries) are verbose and tend to include several complex phenomena, making them especially difficult for existing information retrieval systems. The TREC 2024 ToT track focused on a single ad-hoc retrieval task. Participants were provided with training and development data in the movie domain. Conversely, systems were tested on data that combined three domains: movies, celebrities, and landmarks. This year, 6 groups (including the track coordinators) submitted 18 runs.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-3,
    title = {OVERVIEW OF THE TREC 2024 TIP-OF-THE-TONGUE TRACK},
    author = {Jaime Arguello and Samarth Bhargav and Fernando Diaz and To Eun Kim and Yifan He and Evangelos Kanoulas and Bhaskar Mitra},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Yale NLP at TREC 2024: Tip-of-the-Tongue Track¶

Rohan Phanse, Gabrielle Kaili-May Liu, Arman Cohan

Paper: https://trec.nist.gov/pubs/trec33/papers/yalenlp.tot.pdf
Participant: yalenlp
Runs: dpr-lst-rerank | dpr-pnt-lst-rerank | dpr-router-lst-rerank

Abstract

After preparing training sets for each domain, we trained a single “general” DPR model to handle queries from all domains and used it in our first two runs. In addition, we developed an approach to route queries to multiple single-domain “expert” DPR models for our third run. We used GPT-4o mini to rerank the results retrieved by our DPR models. We developed an initial pointwise reranking stage that we used along with Borges et al.’s [3] list-wise round-robin approach in our first run. We only performed listwise reranking in our other two runs to measure the specific contribution of our proposed pointwise reranking stage to overall performance.

Bibtex

@inproceedings{yalenlp-trec2024-papers-proc-1,
    title = {Yale NLP at TREC 2024: Tip-of-the-Tongue Track},
    author = {Rohan Phanse and Gabrielle Kaili-May Liu and Arman Cohan},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks¶

Lukas Gienapp, Maik Fröbe, Jan Heinrich Merker, Harrisen Scells, Eric Oliver Schmidt, Matti Wiegmann, Martin Potthast, Matthias Hagen

Paper: https://trec.nist.gov/pubs/trec33/papers/webis.biogen.rag.tot.pdf
Participant: webis
Runs: webis-base | webis-tot-01 | webis-tot-02 | webis-tot-04 | webis-tot-03

Abstract

In this paper, we describe the Webis Group’s participation in the 2024 edition of TREC. We participated in the Biomedical Generative Retrieval track, the Retrieval-Augmented Generation track, and the Tip-of-the-Tongue track. For the biomedical track, we applied different paradigms of retrieval-augmented generation with open- and closed-source LLMs. For the Retrieval-Augmented Generation track, we aimed to contrast manual response submissions with fully-automated responses. For the Tip-of-the-Tongue track, we employed query relaxation as in our last year’s submission (i.e., leaving out terms that likely reduce the retrieval effectiveness) that we combine with a new cross-encoder that we trained on an enriched version of the TOMT-KIS dataset.

Bibtex

@inproceedings{webis-trec2024-papers-proc-1,
    title = {Webis at TREC 2024: Biomedical Generative Retrieval, Retrieval-Augmented Generation, and Tip-of-the-Tongue Tracks},
    author = {Lukas Gienapp and Maik Fröbe and Jan Heinrich Merker and Harrisen Scells and Eric Oliver Schmidt and Matti Wiegmann and Martin Potthast and Matthias Hagen},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

IISERK@ToT_2024: Query Reformulation and Layered Retrieval for Tip-of-Tongue Items¶

Subinay Adhikary, Shuvam Banerji Seal, Soumyadeep Sar, Dwaipayan Roy

Paper: https://trec.nist.gov/pubs/trec33/papers/IISER-K.tot.pdf
Participant: IISER-K
Runs: ThinkIR_BM25 | ThinIR_BM25_layer_2 | ThinkIR_semantic | ThinkIR_4_layer_2_w_small

Abstract

In this study, we explore various approaches for known-item retrieval, referred to as “Tip-of-the-Tongue” (ToT). The TREC 2024 ToT track involves retrieving previously encountered items, such as movie names or landmarks when the searcher struggles to recall their exact identifiers. In this paper, we (ThinkIR) focus on four different approaches to retrieve the correct item for each query, including BM25 with optimized parameters and leveraging Large Language Models (LLMs) to reformulate the queries. Subsequently, we utilize these reformulated queries during retrieval using the BM25 model for each method. The four-step query reformulation technique, combined with two-layer retrieval, has enhanced retrieval performance in terms of NDCG and Recall. Eventually, two-layer retrieval achieves the best performance among all the runs, with a Recall@1000 of 0.8067.

Bibtex

@inproceedings{IISER-K-trec2024-papers-proc-1,
    title = {IISERK@ToT\_2024: Query Reformulation and Layered Retrieval for Tip-of-Tongue Items},
    author = {Subinay Adhikary and Shuvam Banerji Seal and Soumyadeep Sar and Dwaipayan Roy},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Video-To-Text¶

TRECVID 2024 - Evaluating video search, captioning, and activity recognition¶

George Awad, Jonathan Fiscus, Afzal Godil, Lukas Diduch, Yvette Graham, Georges Quénot

Paper: https://trec.nist.gov/pubs/trec33/papers/Overview_avs.pdf

Abstract

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology.

Bibtex

@inproceedings{coordinators-trec2024-papers-proc-6,
    title = {TRECVID 2024 - Evaluating video search, captioning, and activity recognition},
    author = {George Awad and Jonathan Fiscus and Afzal Godil and Lukas Diduch and Yvette Graham and Georges Quénot},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Softbank-Meisei at TREC 2024¶

Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar, Hayato Tanoue, Hiroki Nishihara, Yuki Shibata, Takayuki Hori

Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Participant: softbank-meisei
Runs: SoftbankMeisei_vtt_main_run1 | SoftbankMeisei_vtt_main_run2 | SoftbankMeisei_vtt_main_run3 | SoftbankMeisei_vtt_main_run4 | SoftbankMeisei_vtt_sub_run2 | SoftbankMeisei_vtt_sub_run3 | SoftbankMeisei_vtt_sub_run1

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year’s AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year’s VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year’s test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    title = {Softbank-Meisei at TREC 2024},
    author = {Kazuya Ueki and Yuma Suzuki and Hiroki Takushima and Haruki Sato and Takumi Takada and Aiswariya Manoj Kumar and Hayato Tanoue and Hiroki Nishihara and Yuki Shibata and Takayuki Hori},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks¶

Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar, Hayato Tanoue, Hiroki Nishihara, Yuki Shibata, Takayuki Hori

Paper: https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf
Participant: softbank-meisei
Runs: SoftbankMeisei_vtt_main_run1 | SoftbankMeisei_vtt_main_run2 | SoftbankMeisei_vtt_main_run3 | SoftbankMeisei_vtt_main_run4 | SoftbankMeisei_vtt_sub_run2 | SoftbankMeisei_vtt_sub_run3 | SoftbankMeisei_vtt_sub_run1

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year’s AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year’s VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year’s test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex

@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    author = {Kazuya Ueki and Yuma Suzuki and Hiroki Takushima and Haruki Sato and Takumi Takada and Aiswariya Manoj Kumar and Hayato Tanoue and Hiroki Nishihara and Yuki Shibata and Takayuki Hori},
    booktitle = {Proceedings of the 33th Text {REtrieval} Conference (TREC 2024)},
    year = {2024},
    address = {Gaithersburg, Maryland},
    series = {NIST SP 1329}
}