Skip to content

Proceedings - NeuCLIR 2023

Overview of the TREC 2023 NeuCLIR Track

Dawn J. Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

Abstract

The principal goal of the TREC Neural Cross-Language Informa-tion Retrieval (NeuCLIR) track is to study the impact of neuralapproaches to cross-language information retrieval. The track hascreated four collections, large collections of Chinese, Persian, andRussian newswire and a smaller collection of Chinese scientificabstracts. The principal tasks are ranked retrieval of news in one ofthe three languages, using English topics. Results for a multilingualtask, also with English topics but with documents from all threenewswire collections, are also reported. New in this second yearof the track is a pilot technical documents CLIR task for rankedretrieval of Chinese technical documents using English topics. Atotal of 220 runs across all tasks were submitted by six participatingteams and, as baselines, by track coordinators. Task descriptionsand results are presented.

Bibtex
@inproceedings{DBLP:conf/trec/LawrieMMMOSY23,
    author = {Dawn J. Lawrie and Sean MacAvaney and James Mayfield and Paul McNamee and Douglas W. Oard and Luca Soldaini and Eugene Yang},
    editor = {Ian Soboroff and Angela Ellis},
    title = {Overview of the {TREC} 2023 NeuCLIR Track},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/Overview\_neuclir.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/LawrieMMMOSY23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Carlos Lassance, Ronak Pradeep, Jimmy Lin

Abstract

In this notebook, we outline the architecture and evaluation of our TREC 2023submissions, which employ a sophisticated cascading multi-stage ranking frame-work comprising four distinct steps. Through experimentation across multipleconfigurations, we validate the efficacy of each stage within this hierarchy. Ourfindings demonstrate the high effectiveness of our pipeline, consistently outper-forming median benchmarks and approaching the maximal aggregate scores. No-tably, reproducibility is a key outcome of our methodology. Nevertheless, thereproducibility of the final component, termed “listo”, is contingent upon interac-tions with the proprietary and inherently non-deterministic GPT4, raising salientquestions about its consistency and reliability in a research context.

Bibtex
@inproceedings{DBLP:conf/trec/LassancePL23,
    author = {Carlos Lassance and Ronak Pradeep and Jimmy Lin},
    editor = {Ian Soboroff and Angela Ellis},
    title = {Naverloo @ {TREC} Deep Learning and Neuclir 2023: As Easy as Zero, One, Two, Three - Cascading Dual Encoders, Mono, Duo, and Listo for Ad-Hoc Retrieval},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/h2oloo.DN.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/LassancePL23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

ISI's SEARCHER II System for TREC's 2023 NeuCLIR Track

Scott Miller, Shantanu Agarwal, Joel Barry

Abstract

This overviews the University of Massachusetts’s efforts in cross-lingual retrieval run submissions for the TREC 2023 NeuCLIR Track. In this cross-lingual information retrieval (CLIR) task, the search queries are written in English, and three target collections are in Chinese, Persian, and Russian. We focus on building strong ensembles of initial ranking models, including dense and sparse retrievers.

Bibtex
@inproceedings{DBLP:conf/trec/MillerAB23,
    author = {Scott Miller and Shantanu Agarwal and Joel Barry},
    editor = {Ian Soboroff and Angela Ellis},
    title = {ISI's {SEARCHER} {II} System for TREC's 2023 NeuCLIR Track},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/ISI\_SEARCHER.N.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MillerAB23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

UMass at TREC 2023 NeuCLIR Track

Zhiqi Huang, Puxuan Yu, James Allan

Abstract

This overviews the University of Massachusetts’s efforts in cross-lingual retrieval run submissions for the TREC 2023 NeuCLIR Track. In this cross-lingual information retrieval (CLIR) task, the search queries are written in English, and three target collections are in Chinese, Persian, and Russian. We focus on building strong ensembles of initial ranking models, including dense and sparse retrievers.

Bibtex
@inproceedings{DBLP:conf/trec/HuangYA23,
    author = {Zhiqi Huang and Puxuan Yu and James Allan},
    editor = {Ian Soboroff and Angela Ellis},
    title = {UMass at {TREC} 2023 NeuCLIR Track},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/CIIR.N.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HuangYA23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

HLTCOE at TREC 2023 NeuCLIR Track

Eugene Yang, Dawn J. Lawrie, James Mayfield

Abstract

The HLTCOE team applied PLAID, an mT5 reranker, and docu-ment translation to the TREC 2023 NeuCLIR track. For PLAID weincluded a variety of models and training techniques – the Englishmodel released with ColBERT v2, translate-train (TT), TranslateDistill (TD) and multilingual translate-train (MTT). TT trains aColBERT model with English queries and passages automaticallytranslated into the document language from the MS-MARCO v1collection. This results in three cross-language models for the track,one per language. MTT creates a single model for all three doc-ument languages by combining the translations of MS-MARCOpassages in all three languages into mixed-language batches. Thusthe model learns about matching queries to passages simultane-ously in all languages. Distillation uses scores from the mT5 modelover non-English translated document pairs to learn how to scorequery-document pairs. The team submitted runs to all NeuCLIRtasks: the CLIR and MLIR news task as well as the technical docu-ments task.

Bibtex
@inproceedings{DBLP:conf/trec/YangLM23,
    author = {Eugene Yang and Dawn J. Lawrie and James Mayfield},
    editor = {Ian Soboroff and Angela Ellis},
    title = {{HLTCOE} at {TREC} 2023 NeuCLIR Track},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/hltcoe.N.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/YangLM23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

BLADE: The University of Maryland at the TREC 2023 NeuCLIR Track

Suraj Nair, Douglas W. Oard

Abstract

The University of Maryland submitted three runs to the Ad Hoc CLIR Task of the TREC 2023NeuCLIR track. This paper describes three systems that cross the language barrier using a learnedsparse retrieval model using bilingual embeddings.

Bibtex
@inproceedings{DBLP:conf/trec/NairO23,
    author = {Suraj Nair and Douglas W. Oard},
    editor = {Ian Soboroff and Angela Ellis},
    title = {{BLADE:} The University of Maryland at the {TREC} 2023 NeuCLIR Track},
    booktitle = {The Thirty-Second Text REtrieval Conference Proceedings {(TREC} 2023), Gaithersburg, MD, USA, November 14-17, 2023},
    series = {{NIST} Special Publication},
    volume = {1328},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2023},
    url = {https://trec.nist.gov/pubs/trec32/papers/umd\_hcil.N.pdf},
    timestamp = {Tue, 26 Nov 2024 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/NairO23.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}