Skip to content

Proceedings - Routing 1993

Combining Evidence for Information Retrieval

Nicholas J. Belkin, Paul B. Kantor, Colleen Cool, Richard Quatrain

Abstract

This study investigated the effect on retrieval performance of two methods of combination of multiple representations of TREC topics. Five separate Boolean queries for each of the 50 TREC routing topics and 25 of the TREC ad hoc topics were generated by 75 experienced online searchers. Using the INQUERY retrieval system, these queries were both combined into single queries, and used to produce five separate retrieval results, for each topic. In the former case, results indicate that progressive combination of queries leads to progressively improving retrieval performance, significantly better than that of single queries, and at least as good as the best individual single query formulations. In the latter case, data fusion of the ranked lists also led to performance better than that of any single list.

Bibtex
@inproceedings{DBLP:conf/trec/BelkinKCQ93,
    author = {Nicholas J. Belkin and Paul B. Kantor and Colleen Cool and Richard Quatrain},
    editor = {Donna K. Harman},
    title = {Combining Evidence for Information Retrieval},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {35--44},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/txt/03.txt},
    timestamp = {Wed, 07 Jul 2021 16:44:22 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BelkinKCQ93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Automatic Routing and Ad-hoc Retrieval Using SMART: TREC 2

Chris Buckley, James Allan, Gerard Salton

Abstract

The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in the TREC 2 environment, performing both routing and ad-hoc experiments. The ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. The performance of the ad-hoc runs is good, but it is clear we are not yet taking full advantage of the available local information. Our routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was done in TREC 1. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query.

Bibtex
@inproceedings{DBLP:conf/trec/BuckleyAS93,
    author = {Chris Buckley and James Allan and Gerard Salton},
    editor = {Donna K. Harman},
    title = {Automatic Routing and Ad-hoc Retrieval Using {SMART:} {TREC} 2},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {45--56},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/cornell.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BuckleyAS93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

N-Gram-Based Text Filtering For TREC-2

William B. Cavnar

Abstract

Most text retrieval and filtering systems depend heavily on the accuracy of the text they process. In other words, the various mechanisms that they use depend on every word in the text and in the queries being correctly and completely spelled. To get around this limitation, our experimental text filtering system uses N-gram-based matching for document retrieval and routing tasks. The system's first application was for the TREC-2 retrieval and routing task. Its performance on this task was promising, pointing the way for several types of enhancements, both for speed and effectiveness.

Bibtex
@inproceedings{DBLP:conf/trec/Cavnar93,
    author = {William B. Cavnar},
    editor = {Donna K. Harman},
    title = {N-Gram-Based Text Filtering For {TREC-2}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {171--180},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/erim.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Cavnar93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression

William S. Cooper, Aitao Chen, Fredric C. Gey

Abstract

The experiments described here are part of a research program whose objective is to develop a full-text retrieval methodology that is statistically sound and powerful, yet reasonably simple. The methodology is based on the use of a probabilistic model whose parameters are fitted empirically to a learning set of relevance judgements by logistic regression. The method was applied to the TIPSTER data with optimally relativized frequencies of occurrence of match stems as the regression variables. In a routing retrieval experiment, these were supplemented by other variables corresponding to sums of logodds associated with particular match stems.

Bibtex
@inproceedings{DBLP:conf/trec/CooperCG93,
    author = {William S. Cooper and Aitao Chen and Fredric C. Gey},
    editor = {Donna K. Harman},
    title = {Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {57--66},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/berkeley.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/CooperCG93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-2 Routing and Ad-Hoc Retrieval Evaluation using the INQUERY System

W. Bruce Croft, James P. Callan, John Broglio

Abstract

The ARPA TIPSTER project, which is the source of the data and funding for TREC, has involved four sites in the area of text retrieval and routing. The TIPSTER project in the Information Retrieval Laboratory of the Computer Science Department, University of Massachusetts, Amherst (which includes MCC as a subcontractor), has focused on the following goals: Improving the effectiveness of information retrieval techniques for large, full-text databases, Improving the effectiveness of routing techniques appropriate for long-term information needs, and Demonstrating the effectiveness of these retrieval and routing techniques for Japanese full text databases 4. Our general approach to achieving these goals has been to use improved representations of text and information needs in the framework of a new model of retrieval. This model uses Bayesian networks to describe how text and queries should be used to identify relevant documents (6, 3, 7]. Retrieval (and routing) is viewed as a probabilistic inference process which compares text representations based on different forms of linguistic and statistical evidence to representations of information needs based on similar evidence from natural language queries and user interaction. Learning techniques are used to modify the initial queries both for short-term and long-term information needs (relevance feedback and routing, respectively).

Bibtex
@inproceedings{DBLP:conf/trec/CroftCB93,
    author = {W. Bruce Croft and James P. Callan and John Broglio},
    editor = {Donna K. Harman},
    title = {{TREC-2} Routing and Ad-Hoc Retrieval Evaluation using the {INQUERY} System},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {75--84},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/umass.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/CroftCB93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Latent Semantic Indexing (LSI) and TREC-2

Susan T. Dumais

Abstract

Latent Semantic Indexing (LSI) is an extension of the vector retrieval method (e.g., Salton & McGill, 1983) in which the dependencies between terms are explicitly taken into account in the representation and exploited in retrieval. This is done by simultaneously modeling all the interrelationships among terms and documents. We assume that there is some underlying or 'latent' structure in the pattern of word usage across documents, and use statistical techniques to estimate this latent structure. A description of terms, documents and user queries based on the underlying, 'latent semantic', structure (rather than surface level word choice) is used for representing and retrieving information. One advantage of the LSI representation is that a query can be very similar to a document even when they share no words.

Bibtex
@inproceedings{DBLP:conf/trec/Dumais93,
    author = {Susan T. Dumais},
    editor = {Donna K. Harman},
    title = {Latent Semantic Indexing {(LSI)} and {TREC-2}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {105--116},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/bellcore.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Dumais93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Okapi at TREC-2

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, Mike Gatford

Abstract

This paper reports on City University's work on the TREC-2 project from its commencement up to November 1993. It includes many results which were obtained after the August 1993 deadline for submission of official results. For TREC-2, as for TREC-1, City University used versions of the Okapi text retrieval system much as described in [2] (see also [3, 4]). Okapi is a simple and robust set-oriented system based on a generalised probabilistic model with facilities for relevance feedback, but also supporting a full range of deterministic Boolean and quasi-Boolean operations. For TREC-1 [1] the 'standard' Robertson-Sparck Jones weighting function was used for all runs (equa-tion 1, see also [5]). City's performance was not outstandingly good among comparable systems, and the intention for TREC-2 was to develop and investigate a number of alternative probabilistic term-weighting func-tions. Other possibilities included varieties of query ex-pansion, database models enabling paragraph retrieval and the use of phrases obtained by query parsing. Unfortunately, a prolonged disk failure prevented realistic test runs until almost the deadline for submission of results. A full inversion of the disks 1 and 2 database was only achieved a few hours before the final automatic runs. None of the new weighting functions (Sec-tion 1.1) was properly evaluated until after the results had been submitted to NIST; we have since discovered that several of these models perform much better than the weighting functions used for the official runs, and most of the results reported herein are from these later runs.

Bibtex
@inproceedings{DBLP:conf/trec/RobertsonWJHG93,
    author = {Stephen E. Robertson and Steve Walker and Susan Jones and Micheline Hancock{-}Beaulieu and Mike Gatford},
    editor = {Donna K. Harman},
    title = {Okapi at {TREC-2}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {21--34},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/city.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/RobertsonWJHG93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Design and Evaluation of the CLARIT-TREC-2 System

David A. Evans, Robert G. Lefferts

Abstract

The CLARIT team used the opportunity of the TREC-2 evaluations to explore several facets of the CLARIT system. In particular, given the performance of the CLARIT system on TREC-1 tasks (Evans et al. 1993), we focused our attention on evaluating 1. fully-automatic processing of topics and potentially-relevant documents and 2. topic/query augmentation using CLARIT thesaurus-discovery techniques. All of the results we report in this paper follow from straightforward applications of base-level CLARIT pro-cessing, utilizing essentially the same CLARIT components that were employed in the CLARIT-TREC-1 system. The general improvements we observe in CLARIT-TREC-2 processing are attributable to modifications (especially simplifications) in processing steps and in the settings of system variables. In the following sections, we describe the CLARIT-TREC-2 system, report our official processing results, and offer a brief analysis of performance. In addition, we report on several subsequent experiments we have conducted on the TREC-2 collection that test the parameters of the CLARIT-TREC-2 system and identify sources of immediate improvements in processing.

Bibtex
@inproceedings{DBLP:conf/trec/EvansL93,
    author = {David A. Evans and Robert G. Lefferts},
    editor = {Donna K. Harman},
    title = {Design and Evaluation of the {CLARIT-TREC-2} System},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {137--150},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/clarit.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/EvansL93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Combination of Multiple Searches

Edward A. Fox, Joseph A. Shaw

Abstract

The TREC-2 project at Virginia Tech focused on methods for combining the evidence from multiple retrieval runs to improve retrieval performance over any single retrieval method. This paper describes one such method that has been shown to increase performance by combining the similarity values from five different retrieval runs using both vector space and P-norm extended boolean retrieval methods.

Bibtex
@inproceedings{DBLP:conf/trec/FoxS93,
    author = {Edward A. Fox and Joseph A. Shaw},
    editor = {Donna K. Harman},
    title = {Combination of Multiple Searches},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {243--252},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/vpi.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/FoxS93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Probalistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection

Norbert Fuhr, Ulrich Pfeifer, C. Bremkamp, Michael Pollmann

Abstract

In this paper, we describe the application of probabilistic models for indexing and retrieval with the TREC-2 collection. This database consists of about a million documents (2 gigabytes of data) and 100 queries (50 routing and 50 adhoc topics). For document indexing, we use a description-oriented approach which exploits relevance feedback data in order to produce a probabilistic indexing with single terms as well as with phrases. With the adhoc queries, we present a new query term weighting method based on a training sample of other queries. For the routing queries, the RPI model is applied which combines probabilistic indexing with query term weighting based on query-specific feedback data. The experimental results of our approach show very good performance for both types of queries.

Bibtex
@inproceedings{DBLP:conf/trec/FuhrPBP93,
    author = {Norbert Fuhr and Ulrich Pfeifer and C. Bremkamp and Michael Pollmann},
    editor = {Donna K. Harman},
    title = {Probalistic Learning Approaches for Indexing and Retrieval with the {TREC-2} Collection},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {67--74},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/dortmund.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/FuhrPBP93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Feedback and Mixing Experiments with MatchPlus

Stephen I. Gallant, William R. Caid, Joel Carleton, T. Gutschow, Robert Hecht-Nielsen, Kent Pu Qing, David Sudbeck

Abstract

We briefly review the MatchPlus system and describe recent developments with learning word representations, experiments with relevance feedback using neural network learning algorithms, and methods for combining different output lists.

Bibtex
@inproceedings{DBLP:conf/trec/GallantCCGHQS93,
    author = {Stephen I. Gallant and William R. Caid and Joel Carleton and T. Gutschow and Robert Hecht{-}Nielsen and Kent Pu Qing and David Sudbeck},
    editor = {Donna K. Harman},
    title = {Feedback and Mixing Experiments with MatchPlus},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {101--104},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/txt/09.txt},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GallantCCGHQS93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

GE in TREC-2: Results of a Boolean Approximation Method for Routing and Retrieval

Paul S. Jacobs

Abstract

This report describes a few experiments aimed at producing high accuracy routing and retrieval with a simple Boolean engine. There are several motivations behind this work, including a tie-in of Boolean term combinations to our extraction methods, the existence and persistence of many 'legacy' Boolean systems, and the recognition that many information retrieval problems stem from bad queries rather than bad retrieval. The results show very high accuracy, and significant progress, using a Boolean engine for routing based on queries that are manually generated with the help of corpus data. In addition, the results of a straightforward implementation of a fully automatic ad hoc method show some promise of being able to do good automatic query construction within the context of a Boolean system.

Bibtex
@inproceedings{DBLP:conf/trec/Jacobs93,
    author = {Paul S. Jacobs},
    editor = {Donna K. Harman},
    title = {{GE} in {TREC-2:} Results of a Boolean Approximation Method for Routing and Retrieval},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {191--200},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/ge-trec2.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Jacobs93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Effective and Efficient Retrieval from Large and Dynamic Document Collections

Daniel Knaus, Peter Schäuble

Abstract

A new retrieval method together with a new access structure is presented that is aimed at a high update efficiency, a high retrieval efficiency and a high retrieval effectiveness. The access structure consists of signatures and non-inverted descriptions. This access structure can be updated efficiently because the description of a single document is stored in a compact form. The signatures are used to compute approximate retrieval status values first, and the non-inverted descriptions are then used to determine the final list of documents ranked by the exact retrieval status values. Our basic approach based on the standard tf* idf weighting scheme has been improved in in both retrieval effectiveness and retrieval ef-ficiency. On an average, the time for retrieving the top ranked document is clearly below two seconds while the document collection can be updated in 10 msec. insert-ing, deleting, or modifying a document description).

Bibtex
@inproceedings{DBLP:conf/trec/KnausS93,
    author = {Daniel Knaus and Peter Sch{\"{a}}uble},
    editor = {Donna K. Harman},
    title = {Effective and Efficient Retrieval from Large and Dynamic Document Collections},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {163--170},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/ETHatTREC2.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/KnausS93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-2 Document Retrieval Experiments using PIRCS

K. L. Kwok, Laszlo Grunfeld

Abstract

We performed the full experiments, using our network implementation of component probabilistic indexing and retrieval model. Documents were enhanced with a list of semi-automatically generated two-word phrases, and queries with automatic Boolean expressions. An item self-learning procedure was used to initiate network edge weights for retrieval. Initial results submitted were above median for ad hoc, and below median for routing. They were not up to expectation because of a bad choice of high-frequency cutoff for terms, and no query expansion for routing. Later experiments showed that our system does return very good results after correcting the earlier problems and adjusting some parameters. We also re-design our system to handle virtually any number of large files in an incremental fashion, and to do retrieval and learning by initiating our network on demand, without first creating a full inverted file.

Bibtex
@inproceedings{DBLP:conf/trec/KwokG93,
    author = {K. L. Kwok and Laszlo Grunfeld},
    editor = {Donna K. Harman},
    title = {{TREC-2} Document Retrieval Experiments using {PIRCS}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {233--242},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/queens.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/KwokG93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Knowledge-Based Searching with TOPIC

J. Lehman, C. Reid

Abstract

Verity, Inc. is the first major commercial product participant in ThEC. Verity's product is TOPIC. Verity participated in ThEC-2 as a Category A Site. This participafion was Verity's first TREC, and we encountered many of the logistical problems of other sites in their TREC- 1 experience. Topic's search users wish to understand the search result quality to expect in their personal searches on their (large) collecfions. Verity also expects to obtain insights for future product improvements. Topic is a mature commercial-off-the-shelf manual text search program combining the results of human expertise with a powerful search expression language and fast search algorithms. Topic's installations use manually or semi-automatically developed libraries of searches (topics) , which are instances of the search expression language and which are supplied to all users. Verity begins its TREC experiments with a gathering of 'ground truth' regarding unaided adhoc end user search result quality. Future experiments will incorporate predefined searches (topics) and other Topic search aids to determine their level of improvement/impact on search result quality.

Bibtex
@inproceedings{DBLP:conf/trec/LehmanR93,
    author = {J. Lehman and C. Reid},
    editor = {Donna K. Harman},
    title = {Knowledge-Based Searching with {TOPIC}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {209--222},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/txt/20.txt},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/LehmanR93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

An Information Retrieval Test-bed on the CM-5

B. Massand, Craig Stanfill

Abstract

For many years, research on information retrieval was mostly confined to a few relatively small test collections such as the Cranfield collection [11, the NPL Collection [21, and the CACM Collection [3]. Over the years, results on those collections accumulated, with the aim of determining which technique or combination of techniques resulted in the best precision/recall figures on those collections. Gradually, a 'standard model' more-or-less emerged: for the test collections under study, consistently good results are obtained by vector-model retrieval using a cosine similarity measure, tf.idf weighting, and a stemming algorithm (e.g. Chapter 9 of [4], (51).' Out-performing this model on the old test collections has proved extremely difficult. This has led to a danger of stagnation in the field of IR, and a feeling that the majority of what can be learned from precision-recall experiments on the old collections has been learned. [...]

Bibtex
@inproceedings{DBLP:conf/trec/MassandS93,
    author = {B. Massand and Craig Stanfill},
    editor = {Donna K. Harman},
    title = {An Information Retrieval Test-bed on the {CM-5}},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {117--122},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/tmc.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/MassandS93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-II Routing Experiments with the TRW/Paracel Fast Data Finder

Matt Mettler

Abstract

For TREC-II, we were interested in experimenting with improved methods of constructing queries for the Fast Data Finder (FDF) text search coprocessor. We learned from TREC-1 that while the pattern matching ability of the FDF can sometimes be put to significant advantage (we had the high score on 8 of the 50 routing topics in TREC-I), this wasn't sufficient overall to overcome the weaknesses traditionally associated with the boolean approach to text retrieval. Many of the TREC topics are too abstract and ambiguous to respond well to a boolean query formulation. Our goal for this year therefore, was to apply the FDF hardware to a more statistical or soft boolean retrieval approach while not giving up on our ability to make use of specific features or patterns in the text when they are obviously important. We experimented with two different schemes. In the first scheme, we utilized subquery proximity to rank hit documents. We developed the subqueries manually, then determined the optimum proximity values by test runs on the training data. The most effective values were then used in the official routing queries. The second scheme was an FDF adaptation of the traditional Information Retrieval (IR) term weighting approach. In addition to single word terms, we also included two and three word phrases, and FDF subqueries designed to detect special features in the text. While in the terminology of TREC both are examples of manual query formulation with feedback, we believe these techniques can be evolved to create queries automatically from samples of relevant text and to also incorporate user knowledge of specific text features of interest when it exists. We also continue to believe that the utilization of a hardware accelerator such as the Fast Data Finder, enables the implementation of high performance routing or dissemination applications at a far lower cost than can be achieved with conventional general purpose processors.

Bibtex
@inproceedings{DBLP:conf/trec/Mettler93,
    author = {Matt Mettler},
    editor = {Donna K. Harman},
    title = {{TREC-II} Routing Experiments with the TRW/Paracel Fast Data Finder},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {201--208},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/trw.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Mettler93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Ellen M. Voorhees

Abstract

Experiments performed on small collections suggest that expanding query vectors with words that are lexically related to the original query words can improve retrieval effectiveness. Prior experiments using WordNet to automatically expand vectors in the large TREC-1 collection were inconclusive regarding effectiveness gains from lexically related words since any such effects were dominated by the choice of words to expand. This paper specifically investigates the effect of expansion by selecting query concepts to be expanded by hand. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in WordNet. Experimental results suggest that this query expansion technique makes little difference in retrieval effectiveness within the TREC en-vironment, presumably because the TREC topic statements provide such a rich description of the information being sought.

Bibtex
@inproceedings{DBLP:conf/trec/Voorhees93,
    author = {Ellen M. Voorhees},
    editor = {Donna K. Harman},
    title = {On Expanding Query Vectors with Lexically Related Words},
    booktitle = {Proceedings of The Second Text REtrieval Conference, {TREC} 1993, Gaithersburg, Maryland, USA, August 31 - September 2, 1993},
    series = {{NIST} Special Publication},
    volume = {500-215},
    pages = {223--232},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1993},
    url = {http://trec.nist.gov/pubs/trec2/papers/ps/siemens.ps.gz},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Voorhees93.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}