Proceedings - Filtering 1998¶

The TREC-7 Filtering Track: Description and Analysis¶

David A. Hull

Paper: 10.6028/NIST.SP.500-242.filtering-overview

Abstract

This article describes the experiments conducted in the TREC-7 filtering track, which consisted of three subtasks: adaptive filtering, batch filtering, and routing. The focus this year is on adaptive filtering, where the system begins with only the topic statement and must interactively adjust a filtering profile constructed from that topic in response to on-line feedback. In addition to motivating the task and describing the practical details of participating in the track, this document includes a detailed graphical presentation of the experimental results and provides a brief overall analysis of the performance data.

Bibtex

@inproceedings{DBLP:conf/trec/Hull98,
    author = {David A. Hull},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The {TREC-7} Filtering Track: Description and Analysis},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {9--32},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/tr7filter/paper.ps},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Hull98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-overview}
}

INQUERY and TREC-7¶

James Allan, James P. Callan, Mark Sanderson, Jinxi Xu, Steven Wegmann

Participant: UMass
Paper: 10.6028/NIST.SP.500-242.sdr-UMass
Runs: INQ510 | INQ511 | INQ512

Abstract

This year the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts participated in only four of the tracks that were part of the TREC-7 workshop. We worked on ad-hoc retrieval, filtering, VLC, and the SDR track. This report covers the work done on each track successively. We start with a discussion of IR tools that were broadly applied in our work.

Bibtex

@inproceedings{DBLP:conf/trec/AllanCSXW98,
    author = {James Allan and James P. Callan and Mark Sanderson and Jinxi Xu and Steven Wegmann},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{INQUERY} and {TREC-7}},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {148--163},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/umass-trec7.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/AllanCSXW98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.sdr-UMass}
}

Mercure at TREC7¶

Mohand Boughanem, Taoufiq Dkaki, Josiane Mothe, Chantal Soulé-Dupuy

Participant: IRIT
Paper: 10.6028/NIST.SP.500-242.filtering-IRIT
Runs: MerBF1 | MerBF3 | MerRou | MerBF3R | MerAGbR

Abstract

The tests we performed for TREC-7 were focused on automatic ad hoc and filtering tasks. With regard to the automatic ad hoc task we assessed two query modification strategies. Both were based on blind relevance feedback processes. The first one carried on with the TREC6 tests: new parameter values of the relevance backpropagation formulas have been tuned. On the other hand, we proposed a new query modification strategy that uses a text mining approach. Three runs were sent. We sent two runs for the relevance backpropagation strategy: one used long topics (titles, descriptions and narratives) and the other one used titles and descriptions. We sent one run for the text mining strategy using long topics. With regard to the filtering task, we sent runs in batch filtering and routing using both relevance backpropagation and gradient neural backpropagation.

Bibtex

@inproceedings{DBLP:conf/trec/BoughanemDMS98,
    author = {Mohand Boughanem and Taoufiq Dkaki and Josiane Mothe and Chantal Soul{\'{e}}{-}Dupuy},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Mercure at {TREC7}},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {355--360},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/IRITtrec7.pdf.gz},
    timestamp = {Mon, 07 Oct 2019 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BoughanemDMS98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-IRIT}
}

Cluster-Based Adaptive and Batch Filtering¶

David Eichmann, Miguel E. Ruiz, Padmini Srinivasan

Participant: iowa
Paper: 10.6028/NIST.SP.500-242.filtering-iowa
Runs: IAHKaf11 | IAHKaf31 | IAHKbf32 | IAHKbf11 | IAHKaf12 | IAHKaf32

Abstract

Information filtering is increasingly critical to knowledge workers drowning in a growing flood of byte streams [6, 8, 9]. Our interest in the filtering track for TREC-7 grew out of work originally designed for information retrieval on the Web, using both 'traditional' search engine [5] and agent-based techniques [6, 7]. In particular, the work by Cutter, et. al. in clustering [3, 4] has great appeal in the potential for synergistic interaction between user and retrieval system. Our efforts for TREC-7 included two distinct filtering architectures, as well as a more traditional approach to the adhoc track (which used SMART 11.3). The filtering work was done using TRECcer - our Java-based clustering environment - alone for adaptive filtering and a combination of TRECcer and SMART for batch filtering.

Bibtex

@inproceedings{DBLP:conf/trec/EichmannRS98,
    author = {David Eichmann and Miguel E. Ruiz and Padmini Srinivasan},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Cluster-Based Adaptive and Batch Filtering},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {211--220},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/Iowa.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/EichmannRS98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-iowa}
}

TNO TREC7 Site Report: SDR and Filtering¶

Rudie Ekkelenkamp, Wessel Kraaij, David A. van Leeuwen

Participant: TNO
Paper: 10.6028/NIST.SP.500-242.sdr-TNO
Runs: TNOAF103 | TNOAF102

Abstract

This paper reports about experiments in the CLIR and filtering track, carried out at TNO-TPD and TNO-TM. TNO-TPD is also a member of the TwentyOne consortium and as such participated in the AdHoc task and the CLIR track. These experiments are discussed in a separate paper (cf. [Hiemstra and Kraaij98]) elsewhere in this volume.

Bibtex

@inproceedings{DBLP:conf/trec/EkkelenkampKL98,
    author = {Rudie Ekkelenkamp and Wessel Kraaij and David A. van Leeuwen},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TNO} {TREC7} Site Report: {SDR} and Filtering},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {455--462},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/tnotrec7.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/EkkelenkampKL98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.sdr-TNO}
}

Threshold Calibration in CLARIT Adaptive Filtering¶

ChengXiang Zhai, Peter Jansen, Emilia Stoica, Norbert Grot, David A. Evans

Participant: clarit
Paper: 10.6028/NIST.SP.500-242.filtering-clarit
Runs: CLARITafF1a | CLARITafF1b | CLARITafF3a | CLARITafF3b | CLARITbfF1 | CLARITbfF3

Abstract

In this paper, we describe the system and methods used for the CLARITECH entries in the TREC-7 Filtering Track. Our main aim was to study algorithms, designs, and parameters for Adaptive Filtering, as this comes closest to actual applications. For efficiency's sake, however, we adapted a system largely geared towards retrieval and introduced a few critical new components. The first of these components, the delivery ratio mechanism, is used to obtain a profile threshold when no feedback has been received. A second method, which we call beta gamma regulation, is used for threshold updating. It takes into account the number of judged documents processed by the system as well as an expected bias in optimal threshold calculation. Several parameters were determined empirically: apart from the parameters pertaining to the new components, we also experimented with different choices for the reference corpus, and different 'chunk' sizes for processing news stories. Gradually increasing chunk sizes over 'time' appears to help profile learning. Finally, we examined the effect of terminating underperforming queries over the AP90 corpus and found that the utility metric over AP88-AP89 was a good predictor. All of the above innovations contributed to the success of the CLARITECH system in the adaptive filtering track.

Bibtex

@inproceedings{DBLP:conf/trec/ZhaiJSGE98,
    author = {ChengXiang Zhai and Peter Jansen and Emilia Stoica and Norbert Grot and David A. Evans},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Threshold Calibration in {CLARIT} Adaptive Filtering},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {96--103},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/CLARIT_filtering.pdf.gz},
    timestamp = {Thu, 05 May 2022 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/ZhaiJSGE98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-clarit}
}

AT&T at TREC-7¶

Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando C. N. Pereira

Participant: ATT
Paper: 10.6028/NIST.SP.500-242.sdr-ATT
Runs: att98fb5 | att98fr4 | att98ft1 | att98fb6 | att98fr5

Abstract

This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most of our effort for TREC-T was concentrated on SDR and VLC tracks. On the filtering track, we tested a preliminary version of a text classification toolkit that we have been developing over the last year. In the ad-hoc task, we introduce a new tf-factor in our term weighting scheme and use a simplified retrieval algorithm. The same weighting scheme and algorithm are used in the SDR and the VLC tracks. The results from the SDR track show that retrieval from automatic transcriptions of speech is quite competitive with doing retrieval from human transcriptions. Our experiments indicate that document expansion can be used to further improve retrieval from automatic transcripts. Results of filtering track are in line with our expectations given the early developmental stage of our classification software. The results of VLC track do not support our hypothesis that retrieval lists from a distributed search can be effectively merged using only the initial part of the documents.

Bibtex

@inproceedings{DBLP:conf/trec/SinghalCHLP98,
    author = {Amit Singhal and John Choi and Donald Hindle and David D. Lewis and Fernando C. N. Pereira},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {AT{\&}T at {TREC-7}},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {186--198},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/att.pdf.gz},
    timestamp = {Fri, 30 Aug 2019 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/SinghalCHLP98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.sdr-ATT}
}

Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive¶

Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu

Participant: okapi
Paper: 10.6028/NIST.SP.500-242.interactive-okapi
Runs: ok7ff12 | ok7ff32 | ok7ff13 | ok7ff33

Abstract

This year saw a major departure from the previous Okapi approach to routing and filtering. We concentrated our efforts on the twin problems of (a) starting from scratch, with no assumed history of relevance judgments for each topic, and (b) having to define a threshold for retrieval. The thresholding problem is interesting and difficult; we associate it with the problem of assigning an explicit estimate of the probability of relevance to each document. We describe and test a set of methods for initializing the profile for each threshold and for modifying it as time passes. Two pairs of runs were submitted: in one pair queries remained constant, but in the other query terms were reweighted when fresh relevance information became available.

Bibtex

@inproceedings{DBLP:conf/trec/RobertsonWB98,
    author = {Stephen E. Robertson and Steve Walker and Micheline Hancock{-}Beaulieu},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Okapi at {TREC-7:} Automatic Ad Hoc, Filtering, {VLC} and Interactive},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {199--210},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/okapi_proc.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/RobertsonWB98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.interactive-okapi}
}

TREC-7 Ad-Hoc, High Precision and Filtering Experiments using PIRCS¶

Kui-Lam Kwok, Laszlo Grunfeld, M. Chan, Norbert Dinstl, Colleen Cool

Participant: CUNY
Paper: 10.6028/NIST.SP.500-242.filtering-CUNY
Runs: pirc8R1 | pirc8R2 | pirc8FA1 | pirc8FA3 | pirc8FB3 | pirc8FB1

Abstract

In TREC-7, we participated in the main task of automatic ad-hoc retrieval as well as the high precision and filtering tracks. For ad-hoc, three experiments were done with query types of short (title section of a topic), medium (description section) and long (all sections) lengths. We used a sequence of five methods to handle the short and medium length queries. For long queries we employed a re-ranking method based on evidence from matching query phrases in document windows in both stages of a 2-stage retrieval. Results are well above median. For high precision track, we employed our interactive PIRCS system for the first time. In adaptive filtering, we concentrate on dynamically varying the retrieval status value threshold for deciding selection and during the course of filtering. Query weights were trained but expansion was not done. We also submitted results for batch filtering and standard routing based on methods evolved from previous TREC experiments.

Bibtex

@inproceedings{DBLP:conf/trec/KwokGCDC98,
    author = {Kui{-}Lam Kwok and Laszlo Grunfeld and M. Chan and Norbert Dinstl and Colleen Cool},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TREC-7} Ad-Hoc, High Precision and Filtering Experiments using {PIRCS}},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {287--297},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/queens.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/KwokGCDC98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-CUNY}
}

NTT DATA at TREC-7: System Approach for Ad-Hoc and Filtering¶

Hiroyuke Nakajima, Toru Takaki, Tsutomu Hirao, Akira Kitauchi

Participant: NTT
Paper: 10.6028/NIST.SP.500-242.filtering-NTT
Runs: nttd7bf1 | nttd7bf3 | nttd7rk | nttd7rt2 | nttd7rt1

Abstract

In TREC-T, we participated in the ad-hoc task (main task) and the filtering track (sub task). In the ad-hoc task, we adopted a scoring method that used co-occurrence term relations in a document and specific processing in order to determine which conceptual parts of the documents should be targeted for query expansion. In filtering, we adopted a machine-readable dictionary for detecting idioms and an inductive learning algorithm for detecting important co-occurrences of terms. In this paper, we describe the system approach and discuss the evaluation results in brief for our ad-hoc and filtering in TREC-7.

Bibtex

@inproceedings{DBLP:conf/trec/NakajimaTHK98,
    author = {Hiroyuke Nakajima and Toru Takaki and Tsutomu Hirao and Akira Kitauchi},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{NTT} {DATA} at {TREC-7:} System Approach for Ad-Hoc and Filtering},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {420--428},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/nttdata_at_TREC_7.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/NakajimaTHK98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.filtering-NTT}
}

Applying SIGMA to the TREC-7 Filtering Track¶

Grigoris I. Karakoulas, Innes A. Ferguson

Participant: iowa
Paper: 10.6028/NIST.SP.500-242.sdr-cambridge
Runs: IAHKaf11 | IAHKaf31 | IAHKbf32 | IAHKbf11 | IAHKaf12 | IAHKaf32

Abstract

This paper presents work done at Cambridge University, on the TREC-7 Spoken Document Retrieval (SDR) Track. The broadcast news audio was transcribed using a 2-pass gender-dependent HTK speech recog-niser which ran at 50 times real time and gave an overall word error rate of 24.8%, the lowest in the track. The Okapi-based retrieval engine used in TREC-6 by the City/Cambridge University collaboration was supplemented by improving the stop-list, adding a bad-spelling mapper and stemmer exceptions list, adding word-pair information, integrating part-of-speech weighting on query terms and including some pre-search statistical expansion. The final system gave an average precision of 0.4817 on the reference and 0.4509 on the automatic tran-scription, with the R-precision being 0.4603 and 0.4330 respectively. The paper also presents results on a new set of 60 queries with assessments for the TREC-6 test document data used for development pur-poses, and analyses the relationship between recognition accuracy, as defined by a pre-processed term error rate, and retrieval performance for both sets of data.

Bibtex

@inproceedings{DBLP:conf/trec/KarakoulasF98,
    author = {Grigoris I. Karakoulas and Innes A. Ferguson},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Applying {SIGMA} to the {TREC-7} Filtering Track},
    booktitle = {Proceedings of The Seventh Text REtrieval Conference, {TREC} 1998, Gaithersburg, Maryland, USA, November 9-11, 1998},
    series = {{NIST} Special Publication},
    volume = {500-242},
    pages = {258--263},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1998},
    url = {https://trec.nist.gov/pubs/trec7/papers/cuhtk-trec98-uspaper.pdf.gz},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/KarakoulasF98.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-242.sdr-cambridge}
}