Proceedings - News 2018¶

TREC 2018 News Track Overview¶

Ian Soboroff, Shudong Huang, Donna Harman

Paper: 10.6028/NIST.SP.500-331.news-overview

Abstract

The News track is a new track for TREC 2019, focused on information retrieval in the service of helping people read the news. In cooperation with the Washington Post1, we released a new collection of 600,000 news articles, and crafted two tasks related to how news is presented on the web.

Bibtex

@inproceedings{DBLP:conf/trec/SoboroffHH18,
    author = {Ian Soboroff and Shudong Huang and Donna Harman},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {{TREC} 2018 News Track Overview},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/Overview-News.pdf},
    timestamp = {Wed, 03 Feb 2021 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/SoboroffHH18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-overview}
}

htw saar @ TREC 2018 News Track¶

Agra Bimantara, Michelle Blau, Kevin Engelhardt, Johannes Gerwert, Tobias Gottschalk, Philipp Lukosz, Shenna Piri, Nima Saken Shaft, Klaus Berberich

Participant: htwsaar
Paper: 10.6028/NIST.SP.500-331.news-htwsaar
Runs: htwsaar1 | htwsaar2 | htwsaar3 | htwsaar4

Abstract

This paper describes our participation in the background linking task of the TREC 2018 News Track. We explored four different methods to address the task. All of our methods largely rely on off-the-shelf open-source components (e.g., Apache Lucene for indexing the documents). The methods differ in how they analyze the given input document to obtain a query (e.g., by keyword extraction or named entity recognition) and to what extent the returned results are re-ranked taking meta data of the documents (e.g., publication dates) into account.

Bibtex

@inproceedings{DBLP:conf/trec/BimantaraBEGGLP18,
    author = {Agra Bimantara and Michelle Blau and Kevin Engelhardt and Johannes Gerwert and Tobias Gottschalk and Philipp Lukosz and Shenna Piri and Nima Saken Shaft and Klaus Berberich},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {htw saar @ {TREC} 2018 News Track},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/htwsaar-N.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BimantaraBEGGLP18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-htwsaar}
}

TREMA-UNH at TREC 2018: Complex Answer Retrieval and News Track¶

Sumanta Kashyapi, Shubham Chatterjee, Jordan Ramsdell, Laura Dietz

Participant: trema-unh
Paper: 10.6028/NIST.SP.500-331.news-trema-unh
Runs: UNH-ParaBm25Ecm | UNH-ParaBm25 | UNH-TitleBm25

Abstract

This notebook describes the submission of team TREMA-UNH to the TREC Complex Answer Retrieval track and the TREC News track in 2018. Our methods focus on passage retrieval, entity-aware passage retrieval, and entity retrieval.

Bibtex

@inproceedings{DBLP:conf/trec/KashyapiCRD18,
    author = {Sumanta Kashyapi and Shubham Chatterjee and Jordan Ramsdell and Laura Dietz},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {{TREMA-UNH} at {TREC} 2018: Complex Answer Retrieval and News Track},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/TREMA-UNH-CAR-N.pdf},
    timestamp = {Tue, 21 Mar 2023 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KashyapiCRD18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-trema-unh}
}

Using clustering to filter results of an Information Retrieval system¶

Pilar López-Úbeda, Manuel Carlos Díaz-Galiano, María Teresa Martín Valdivia, Luis Alfonso Ureña López

Participant: SINAI
Paper: 10.6028/NIST.SP.500-331.news-SINAI
Runs: SINAI_base_A | SINAI_base_T | SINAI_base_TA | SINAI_cluster_A | SINAI_cluster_T | SINAI_clusterTA

Abstract

In this paper we present our participation as SINAI research group from the Universidad de Ja´en at Text REtrieval Conceference (TREC) in the News task. Specifically we have participated in sub-task 1 called Background Linking. In this task we try to apply K-means clustering algorithms to obtain related news from different domains and topics. We also use document reordering techniques to obtain a new ordered list of relevant articles. For text processing we use the popular TF-IDF technique. The results obtained have not overcome the proposed baseline, although we are usually above average, improving in some cases 78% the average

Bibtex

@inproceedings{DBLP:conf/trec/Lopez-UbedaDVL18,
    author = {Pilar L{\'{o}}pez{-}{\'{U}}beda and Manuel Carlos D{\'{\i}}az{-}Galiano and Mar{\'{\i}}a Teresa Mart{\'{\i}}n Valdivia and Luis Alfonso Ure{\~{n}}a L{\'{o}}pez},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {Using clustering to filter results of an Information Retrieval system},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/SINAI-N.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Lopez-UbedaDVL18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-SINAI}
}

Paragraph as Lead - Finding Background Documents for News Articles¶

Kuang Lu, Hui Fang

Participant: udel_fang
Paper: 10.6028/NIST.SP.500-331.news-udel_fang
Runs: UDInfolab_kweh | UDInfolab_kwh | UDInfolab_kwef | UDInfolab_kwf | UDInfolab_kwev

Abstract

When reading a news article, it is very useful that articles about the background of various aspects of the story are provided so that the readers can better understand the story. In this year's News Track, we tried to use paragraphs as leads to find background articles since they tend to cover different aspects of the main story [3]. More specifically, the keywords in the paragraphs were extracted and used as queries to find background articles. Entities were also leveraged to improve the retrieval performances of the keyword queries.

Bibtex

@inproceedings{DBLP:conf/trec/Lu018,
    author = {Kuang Lu and Hui Fang},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {Paragraph as Lead - Finding Background Documents for News Articles},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/udel\_fang-N.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Lu018.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-udel_fang}
}

UMass at TREC 2018: CAR, Common Core and News Tracks¶

Shahrzad Naseri, John Foley, James Allan

Participant: UMass
Paper: 10.6028/NIST.SP.500-331.news-UMass
Runs: umass_cbrdm | umass_rdm | umass_rm

Abstract

UMass participated in three TREC tasks in 2018: the TREC CAR, TREC Core tasks and TREC News (Background Linking). In this paper we detail the contents of our submissions and our lessons learned from this year's participation.

Bibtex

@inproceedings{DBLP:conf/trec/NaseriFA18,
    author = {Shahrzad Naseri and John Foley and James Allan},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {UMass at {TREC} 2018: CAR, Common Core and News Tracks},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/UMass-CAR-CC-N.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/NaseriFA18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-UMass}
}

Signal at TREC 2018 News Track¶

Dwane van der Sluis, Dyaa Albakour, Miguel Martinez

Participant: signal
Paper: 10.6028/NIST.SP.500-331.news-signal
Runs: signal-ucl-slst | signal-ucl-sel | signal-ucl-eff

Abstract

This paper provides an overview of the experiments we carried out for the entity ranking task at the TREC 2018 News Track. In particular, we experimented with adapting the supervised salience component of Salient Entity Linking (SEL), a state-of-the-art unified framework for entity linking and salience ranking. In our adaptation, we assume perfect entity linking performance and rank the entities using the salience components of SEL. Furthermore, in this adaptation, we aim to enhance the efficiency of the supervised salience ranking, and also to introduce sentiment-based features for entity salience.

Bibtex

@inproceedings{DBLP:conf/trec/SluisAM18,
    author = {Dwane van der Sluis and Dyaa Albakour and Miguel Martinez},
    editor = {Ellen M. Voorhees and Angela Ellis},
    title = {Signal at {TREC} 2018 News Track},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference, {TREC} 2018, Gaithersburg, Maryland, USA, November 14-16, 2018},
    series = {{NIST} Special Publication},
    volume = {500-331},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/signal.N.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/SluisAM18.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-331.news-signal}
}

Anserini at trec 2018: Centre, Common Core, and News Tracks¶

Yang, Peilin, Lin, Jimmy

Participant: Anserini
Paper: 10.6028/NIST.SP.500-331.centre-Anserini
Runs: anserini_1000w | anserini_nsdm | anserini_nax | anserini_sdmp | anserini_axp

Abstract

Anserini is an open-source information retrieval toolkit built on Lucene [3, 4]. The goal of our effort is to support information retrieval research using the popular open-source Lucene search library by allowing researchers to easily replicate results with modern ranking models on diverse test collections. Although there are many open-source search engines developed and maintained by academic research groups, most of them are designed primarily to facilitate the publication of research papers, and as such, they often suffer from poor usability, incomplete documentation, and a host of other issues. The growing complexity of modern software ecosystems and the diverse capabilities that are required to build useful end-to-end search applications places academic research groups at a huge disadvantage relative to Lucene. Except for a handful of commercial web search engines that deploy custom infrastructure, Lucene has become the de facto platform in industry for building production search applications—used by organizations as diverse as Twitter, Reddit, Bloomberg, and Target. It has an active developer base, diverse features and capabilities, and lies at the center of a vibrant ecosystem. However, Lucene lacks systematic support for information retrieval research—in particular, ad hoc experimentation using standard test collections. This is where Anserini comes in: we enable cutting-edge information retrieval research using Lucene. At TREC 2018, we participated in the CENTRE, Common Core, and News Tracks. Each is described in its own section below. Our development efforts centered around the v0.1.0 release of Anserini, which is based on Lucene 6.3.0 (not the latest release).

Bibtex

@inproceedings{yang2018anserini,
    title = {Anserini at trec 2018: Centre, Common Core, and News Tracks},
    author = {Yang, Peilin and Lin, Jimmy},
    booktitle = {Proceedings of the Twenty-Seventh Text REtrieval Conference (TREC 2018), Gaithersburg, MD},
    year = {2018},
    url = {https://trec.nist.gov/pubs/trec27/papers/anserini-CTR-CC-N.pdf},
    biburl = {https://dblp.org/},
    doi = {10.6028/NIST.SP.500-331.centre-Anserini}
}