Proceedings - Filtering 1995¶

The TREC-4 Filtering Track¶

David D. Lewis

Paper: 10.6028/NIST.SP.500-236.filtering-overview

Abstract

The TREC-4 filtering track was an experiment in the evaluation of binary text classification systems. In contrast to ranking systems, binary text classification systems may need to produce result sets of any size, requiring that sampling be used to estimate their effectiveness. We present an effectiveness measure based on utility, and two sampling strategies (pooling and stratified sampling) for estimating utility of submitted sets. An evaluation of four sites was successfully carried out using this approach.

Bibtex

@inproceedings{DBLP:conf/trec/Lewis95,
    author = {David D. Lewis},
    editor = {Donna K. Harman},
    title = {The {TREC-4} Filtering Track},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/lewis.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Lewis95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-236.filtering-overview}
}

TREC-4 Ad-Hoc, Routing Retrieval and Filtering Experiments using PIRCS¶

K. L. Kwok, Laszlo Grunfeld

Paper: 10.6028/NIST.SP.500-236.filtering-queens

Abstract

Our ad-hoc submissions are pircs which is fully automatic, and pircs2 which involves manually weighting some terms and adding some new words to the original topic descriptions. The number of words added are minimal. Both methods involve training and query expansion using the best-ranked subdocuments from an initial retrieval as feedback. For our routing experiments we make use of massive query expansion of 350 terms in pircsL, with emphasis on expansion with low frequency terms. Training is done using short and top-ranked known relevant subdocuments. In pircsC, we define four different 'expert' queries (pircsL being one of them) for each topic by using different subsets of training document, and later combine their retrieval results into one. Filtering experiment is done with the retrieval lists of pircsL. For each query, we use the training collections to define retrieval status values (RSVs) where the utilities are maximum for the three precision types. These RSVs are then used as thresholds for the new collections. Evaluated results show that both ad-hoc and routing retrievals perform substantially better than median.

Bibtex

@inproceedings{DBLP:conf/trec/KwokG95,
    author = {K. L. Kwok and Laszlo Grunfeld},
    editor = {Donna K. Harman},
    title = {{TREC-4} Ad-Hoc, Routing Retrieval and Filtering Experiments using {PIRCS}},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/queenst4.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KwokG95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-236.filtering-queens}
}