Proceedings - Novelty 2002¶

Overview of the TREC 2002 Novelty Track¶

Donna Harman

Paper: http://trec.nist.gov/pubs/trec11/papers/NOVELTY.OVER.pdf

Abstract

The novelty track was a new track in TREC-11. The basic task was as follows: given a TREC topic and an ordered list of relevant documents (ordered by relevance ranking), find the relevant and 'novel' sentences that should be returned to the user from this set. There were 13 groups that participated in this new task.

Bibtex

@inproceedings{DBLP:conf/trec/Harman02,
    author = {Donna Harman},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Overview of the {TREC} 2002 Novelty Track},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/NOVELTY.OVER.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Harman02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

A Machine Learning Approach for QA and Novelty Tracks: NTT System Description¶

Hideto Kazawa, Tsutomu Hirao, Hideki Isozaki, Eisaku Maeda

Participant: nttcom_kazawa
Paper: http://trec.nist.gov/pubs/trec11/papers/nttcom.pdf
Runs: nttcslabnvp | nttcslabnvr2

Abstract

In one sense, the goals of QA and Novelty tasks are the same: extracting small document parts which are relevant to users' queries. Additionally, the unit of extraction is almost always fixed in both tasks. For QA, an answer is a noun phrase in most cases, and for Novelty, a sentence is recognized as the basic information unit. This observation leads us to the following unified approach to both QA and Novelty tasks: first identify information units in documents, then judge whether each unit is relevant to the query. This two step approach is amenable to machine learning methods because each step can be cast as a classification problem. For example, noun phrase identification can be achieved by classifying each word into the start/middle/end/exterior of a noun phrase; sentence identification by classifying whether each period marks the of a sentence. Additionally, relevance judgment can be regarded as the classification of a pair of query and an information unit into a relevant-pair or non-relevant-pair. In QA and Novelty Tracks at TREC 2002, we studied the feasibility of this two step approach, using Support Vector Machines as the learning algorithm of the classifiers. Since many studies on identifying information units have already been reported, we concentrate on the relevance judgment step in QA and Novelty tasks in this paper

Bibtex

@inproceedings{DBLP:conf/trec/KazawaHIM02,
    author = {Hideto Kazawa and Tsutomu Hirao and Hideki Isozaki and Eisaku Maeda},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {A Machine Learning Approach for {QA} and Novelty Tracks: {NTT} System Description},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/nttcom.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KazawaHIM02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC 2002 Web, Novelty and Filtering Track Experiments using PIRCS¶

Kui-Lam Kwok, Peter Deng, Norbert Dinstl, M. Chan

Participant: cuny
Paper: http://trec.nist.gov/pubs/trec11/papers/queens.kwok.pdf
Runs: pircs2N01 | pircs2N02 | pircs2N03 | pircs2N04 | pircs2N05

Abstract

In TREC2002, we participated in three tracks: web, novelty and adaptive filtering. The Web track has two tasks: distillation and named-page retrieval. Distillation is a new utility concept for ranking documents, and needs new design on the output document ranked list after an ad-hoc retrieval from the web (.gov) collection. Novelty track is a new task that involves identifying relevant sentences to a question, and to remove duplicate or non-novel entries in the answer list. The third track is adaptive filtering. We revived a filtering program that was functional at TREC-9 with some added capability. Sections 2, 3, 4 describe our participation in these tracks respectively. Section 5 has our conclusion.

Bibtex

@inproceedings{DBLP:conf/trec/KwokDDC02,
    author = {Kui{-}Lam Kwok and Peter Deng and Norbert Dinstl and M. Chan},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{TREC} 2002 Web, Novelty and Filtering Track Experiments using {PIRCS}},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/queens.kwok.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KwokDDC02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Filtering, Novelty Detection, and Named-Page Finding¶

Kevyn Collins-Thompson, Paul Ogilvie, Yi Zhang, Jamie Callan

Participant: cmu_lti
Paper: http://trec.nist.gov/pubs/trec11/papers/cmu.collins-thompson.pdf
Runs: cmu02t300rCv | cmu02t300rCw | cmu02t300rCb | cmu02t300rBw | cmu02t300rAs

Abstract

In TREC 11, our group participated in the Novelty track, Filtering track, and the Named-Page Finding task of the Web track. This paper describes our approaches, experiments, and results. As the approach for each task is quite different, the paper contains a section for each of the tasks. The following section describes our experiments in adaptive filtering, Section 3 describes named-page finding, and section 4 discusses the Novelty track.

Bibtex

@inproceedings{DBLP:conf/trec/Collins-ThompsonOZC02,
    author = {Kevyn Collins{-}Thompson and Paul Ogilvie and Yi Zhang and Jamie Callan},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Information Filtering, Novelty Detection, and Named-Page Finding},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/cmu.collins-thompson.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Collins-ThompsonOZC02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Novelty Track at IRIT-SIG¶

Taoufiq Dkaki, Josiane Mothe, Jérôme Augé

Participant: irit
Paper: http://trec.nist.gov/pubs/trec11/papers/irit.novelty.pdf
Runs: dumbrun

Abstract

IRIT developed a new strategy in order to detect the relevant sentences that we did not try in a more general context of document retrieval but did try previously and partially in document categorization. In our approach a sentence is considered as relevant if it matches the topic with a certain level of coverage. This level of coverage depends on the category of the terms used in the texts. Three types of terms have been defined: highly relevant, lowly relevant and no relevant. With regard to the novelty part, a sentence is considered as novel when its levels of coverage with the previously processed sentences and with the best-matching sentences do not exceed certain thresholds.

Bibtex

@inproceedings{DBLP:conf/trec/DkakiMA02,
    author = {Taoufiq Dkaki and Josiane Mothe and J{\'{e}}r{\^{o}}me Aug{\'{e}}},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Novelty Track at {IRIT-SIG}},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/irit.novelty.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/DkakiMA02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

THU TREC 2002: Novelty Track Experiments¶

Min Zhang, Ruihua Song, Chuan Lin, Shaoping Ma, Zhe Jiang, Yijiang Jin, Yiqun Liu, Le Zhao

Participant: tsinghua
Paper: http://trec.nist.gov/pubs/trec11/papers/tsinghuau.novelty2.pdf
Runs: thunv1 | thunv2 | thunv3 | thunv4 | thunv5

Abstract

This is the first time that Tsinghua University took part in TREC. In this year's novelty track, our basic idea is to find the key factor that help people find relevant and new information on a set of documents with noise. We paid attention to three points: 1. how to get full information from a short sentence; 2. how to complement hidden well-known knowledge to the sentences; 3. how to make the determination of duplication. Accordingly, expansion-based technologies are the key points. Studies of expansion technologies have been performed on three levels: efficient query expansion based on thesaurus and statistics, replacement-based document expansion, and term-expansion-related duplication elimination strategy based on overlapping measurement. Besides, two issues have been studied: finding key information in topics, and dynamic result selection. A new IR system has been developed for the task. In the system, four weighting strategies have been implemented: ltn.lnu [1], BM2500[2], FUB1 [3], FUB2 [3]. It provides both similarity and overlapping measurements, based on term expansion. Comparisons can be made on sentence-to-sentence or sentence-to-pool level.

Bibtex

@inproceedings{DBLP:conf/trec/ZhangSLMJJLZ02,
    author = {Min Zhang and Ruihua Song and Chuan Lin and Shaoping Ma and Zhe Jiang and Yijiang Jin and Yiqun Liu and Le Zhao},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{THU} {TREC} 2002: Novelty Track Experiments},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/tsinghuau.novelty2.pdf},
    timestamp = {Wed, 16 Sep 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/ZhangSLMJJLZ02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Some Similarity Computation Methods in Novelty Detection¶

Ming-Feng Tsai, Hsin-Hsi Chen

Participant: NTU
Paper: http://trec.nist.gov/pubs/trec11/papers/ntu.feng.final2.pdf
Runs: ntu1 | ntu2 | ntu3

Abstract

In the novelty task, the amount of information of a sentence that can be used in similarity computation is the major challenging issue. Some sort of information expansion methods was introduced to tackle this problem. Our approach to relevance identification was to expand the information of a sentence with the context of this sentence using a sliding window method. The similarity was measured by the number of words of a topic description that match the sentences within a window. Besides, WordNet was employed to relax word match operation to inexact match. In the novelty detection part, we first applied a coherent text segmentation algorithm to partition the sentences extracted from the relevance identification part into several coherent segments denoting sub-topics. Then we compute the similarity of each sentence with each segment. A sentence was in terms of a sentence-segment similarity vector. Two sentences are regarded as similar if they are related to the same sub-topics. In this way, the redundant sentences were filtered out.

Bibtex

@inproceedings{DBLP:conf/trec/TsaiC02,
    author = {Ming{-}Feng Tsai and Hsin{-}Hsi Chen},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Some Similarity Computation Methods in Novelty Detection},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/ntu.feng.final2.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/TsaiC02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Experiments in Novelty Detection at Columbia University¶

Barry Schiffman

Participant: columbia_novelty
Paper: http://trec.nist.gov/pubs/trec11/papers/columbia.schiffman.pdf
Runs: novcolcl35 | novcolclfx | novcolmerg | novcolsent | novcolcl85

Abstract

This paper describes the method we used for the Novelty Track for the 2002 Text Retrieval Conference (TREC). We tried to adapt tools we are developing for a task closely related to the novelty part of the this track. The system we are building will scan a stream of documents and present to the user only the new information it finds. For the 'relevance' part of the TREC, we decided to test the applicability of some of these tools. Since information retrieval is not a focus of our research, we thought it would be more interesting to use something new rather than try to hurriedly catch up. The results were far from satisfactory, but it is clear from the overall results that novelty detection remains a difficult and unsolved problem.

Bibtex

@inproceedings{DBLP:conf/trec/Schiffman02,
    author = {Barry Schiffman},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Experiments in Novelty Detection at Columbia University},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/columbia.schiffman.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Schiffman02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The University of Michigan at TREC 2002: Question Answering and Novelty Tracks¶

Hong Qi, Jahna Otterbacher, Adam Winkel, Dragomir R. Radev

Participant: umich
Paper: http://trec.nist.gov/pubs/trec11/papers/umichigan.radev.pdf
Runs: umich1 | UMICH4 | UMich3 | UMich5 | UMIch2

Abstract

The University of Michigan participated in two evaluations this year. In the Question Answering Track, we entered three different versions of our system, NSIR, previously described in (1. For the Novelty Track, we modified our multi-document summarizer, MEAD (www.summarization.com/mead) and submitted five runs with different input parameters.

Bibtex

@inproceedings{DBLP:conf/trec/QiOWR02,
    author = {Hong Qi and Jahna Otterbacher and Adam Winkel and Dragomir R. Radev},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {The University of Michigan at {TREC} 2002: Question Answering and Novelty Tracks},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/umichigan.radev.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/QiOWR02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

A Crude Cut at Query Expansion¶

Philip Rennert

Participant: streamsage
Paper: http://trec.nist.gov/pubs/trec11/papers/streamsage.rennert.pdf
Runs: ss1

Bibtex

@inproceedings{DBLP:conf/trec/Rennert02,
    author = {Philip Rennert},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {A Crude Cut at Query Expansion},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/streamsage.rennert.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Rennert02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The University of Amsterdam at TREC 2002¶

Christof Monz, Jaap Kamps, Maarten de Rijke

Participant: uva
Paper: http://trec.nist.gov/pubs/trec11/papers/uamsterdam.derijke.pdf
Runs: UAmsT11ntste | UAmsT11ntlem | UAmsT11ntcom

Abstract

We describe our participation in the TREC 2002 Novelty, Question answering, and Web tracks. We provide a detailed account of the ideas underlying our approaches to these tasks. All our runs used the FlexIR information retrieval system.

Bibtex

@inproceedings{DBLP:conf/trec/MonzKR02,
    author = {Christof Monz and Jaap Kamps and Maarten de Rijke},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {The University of Amsterdam at {TREC} 2002},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/uamsterdam.derijke.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MonzKR02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

UMass at TREC 2002: Cross Language and Novelty Tracks¶

Leah S. Larkey, James Allan, Margaret E. Connell, Alvaro Bolivar, Courtney Wade

Participant: umass
Paper: http://trec.nist.gov/pubs/trec11/papers/umass.wade.pdf
Runs: CIIR02tfkl | CIIR02tfnew

Abstract

The University of Massachusetts participated in the cross-language and novelty tracks this year. The cross-language submission was characterized by combination of evidence to merge results from two different retrieval engines and a variety of different resources - stemmers, dictionaries, machine translation, and an acronym database. We found that proper names were extremely important in this year's queries. For the novelty track, we applied variants of techniques that have been employed for other problems. In addition, we created additional training data by manually annotating 48 additional topics.

Bibtex

@inproceedings{DBLP:conf/trec/LarkeyACBW02,
    author = {Leah S. Larkey and James Allan and Margaret E. Connell and Alvaro Bolivar and Courtney Wade},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {UMass at {TREC} 2002: Cross Language and Novelty Tracks},
    booktitle = {Proceedings of The Eleventh Text REtrieval Conference, {TREC} 2002, Gaithersburg, Maryland, USA, November 19-22, 2002},
    series = {{NIST} Special Publication},
    volume = {500-251},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2002},
    url = {http://trec.nist.gov/pubs/trec11/papers/umass.wade.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/LarkeyACBW02.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}