Skip to content

Proceedings 1994

Overview of the Third Text REtrieval Conference (TREC-3)

Donna Harman

Abstract

In November of 1992 the first Text REtrieval Conference (TREC-1) was held at NIST [Harman 1993]. The confer-ence, co-sponsored by ARPA and NIST, brought together information retrieval researchers to discuss their system results on a new large test collection (the TIPSTER col-lection). This conference became the first in a series of ongoing conferences dedicated to encouraging research in retrieval from large-scale test collections, and to encouraging increased interaction among research groups in industry and academia. From the beginning there has been an almost equal number of universities and companies participating, with an emphasis on exploring many different types of approaches to the text retrieval problem. [...]

Bibtex
@inproceedings{DBLP:conf/trec/Harman94,
    author = {Donna Harman},
    editor = {Donna K. Harman},
    title = {Overview of the Third Text REtrieval Conference {(TREC-3)}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {1--20},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/overview.ps},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Harman94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Adhoc

Automatic Query Expansion Using SMART: TREC 3

Chris Buckley, Gerard Salton, James Allan, Amit Singhal

Abstract

The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: adding from 300 to 530 terms to each query. These terms come from known relevant documents in the case of routing, and from just the top retrieved documents in the case of ad-hoc and Spanish. This approach improves effectiveness from 7% to 25% in the various experiments. Other ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. Using an overlapping text window definition of 'local', we achieve a 16% improvement.

Bibtex
@inproceedings{DBLP:conf/trec/BuckleySAS94,
    author = {Chris Buckley and Gerard Salton and James Allan and Amit Singhal},
    editor = {Donna K. Harman},
    title = {Automatic Query Expansion Using {SMART:} {TREC} 3},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {69--80},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/cornell.new.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BuckleySAS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Experiments in the Probabilistic Retrieval of Full Text Documents

William S. Cooper, Aitao Chen, Fredric C. Gey

Abstract

The experiments described here constitute a continuation of a research program whose object is to find probabilistically sound, yet simple and powerful, ways of combining search clues in full-text retrieval. The methodology investigated for ad hoc retrieval is that of logistic regression, in which the retrieval rule takes the form of a regression equation fitted to learning data. Most of the variables used in the regression take the form of means rather than the more customary sums, and it is argued that this is logically preferable. Radical manual reformulations of the topics were tried out and found to boost retrieval effectiveness. For routing retrieval, an approach based on the Assumption of Linked Depen-dence, involving the extraction of relevance-associated stems from feedback documents, is investi-gated. One characteristic of this approach is that only a very minimal use is made of the original topic formulation.

Bibtex
@inproceedings{DBLP:conf/trec/CooperCG94,
    author = {William S. Cooper and Aitao Chen and Fredric C. Gey},
    editor = {Donna K. Harman},
    title = {Experiments in the Probabilistic Retrieval of Full Text Documents},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {127--134},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/berkeley.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CooperCG94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Latent Semantic Indexing (LSI): TREC-3 Report

Susan T. Dumais

Abstract

This paper reports on recent developments of the Latent Semantic Indexing (LSI) retrieval method for TREC-3. LSI uses a reduced-dimension vector space to represent words and documents. An important aspect of this representation is that the association between terms is automatically captured, explicitly represented, and used to improve retrieval. We used LSI for both TREC-3 routing and adhoc tasks. For the routing tasks an LS space was constructed using the training documents. We compared profiles constructed using just the topic words (no training) with profiles constructed using the average of relevant documents (no use of the topic words). Not surprisingly, the centroid of the relevant documents was 30% better than the topic words. This simple feedback method was quite good compared to the routing performance of other systems. Various combinations of information from the topic words and relevant documents provide small additional improvements in performance. For the adhoc task we compared LSI to keyword vector matching (i.e. using no dimension reduction). Small advantages were obtained for LSI even with the long TREC topic statements.

Bibtex
@inproceedings{DBLP:conf/trec/Dumais94,
    author = {Susan T. Dumais},
    editor = {Donna K. Harman},
    title = {Latent Semantic Indexing {(LSI):} {TREC-3} Report},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {219--230},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/lsi.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Dumais94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The Collection Fusion Problem

Ellen M. Voorhees, Narendra Kumar Gupta, Ben Johnson-Laird

Abstract

This paper examines the feasibility of merging the results of retrieval runs on separate, autonomous document collections into an effective combined result. In particular, we examine two collection fusion techniques that use the results of past queries to compute the number of documents to retrieve from each of a set of subcollections such that the total number of retrieved documents is equal to N, the number of documents to be returned to the user. The fusion techniques are independent of the particular weighting schemes, similarity measures, and retrieval models used by the component collections. Our official TREC-3 runs are fusion runs in which N = 1000; other runs investigate the effects of varying N. These results show that the precision averaged over the 50 queries is within 10% of the precision of an effective single collection run for a wide range of values of N.

Bibtex
@inproceedings{DBLP:conf/trec/VoorheesGJ94,
    author = {Ellen M. Voorhees and Narendra Kumar Gupta and Ben Johnson{-}Laird},
    editor = {Donna K. Harman},
    title = {The Collection Fusion Problem},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {95--104},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/siemens\_paper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/VoorheesGJ94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Natural Language Information Retrieval: TREC-3 Report

Tomek Strzalkowski, Jose Perez Carballo, Mihnea Marinescu

Abstract

In this paper we report on the recent developments in NYU's natural language information retrieval system, especially as related to the 3rd Text Retrieval Conference (TREC-3). The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. For the present TREC-3 effort, the total of 3.3 GBytes of text articles have been processed (Tipster disks 1 through 3), including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications's Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. Since the TREC-2 conference, many components of the system have been redesigned to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.

Bibtex
@inproceedings{DBLP:conf/trec/StrzalkowskiCM94,
    author = {Tomek Strzalkowski and Jose Perez Carballo and Mihnea Marinescu},
    editor = {Donna K. Harman},
    title = {Natural Language Information Retrieval: {TREC-3} Report},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {39--54},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/nyu\_trec3\_paper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/StrzalkowskiCM94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Combination of Multiple Searches

Joseph A. Shaw, Edward A. Fox

Abstract

The TREC-3 project at Virginia Tech focused on methods for combining the evidence from multiple retrieval runs and queries to improve retrieval performance over any single retrieval method or query. The largest improvements result from the combination of retrieval paradigms rather than from the use of multiple similar queries.

Bibtex
@inproceedings{DBLP:conf/trec/ShawF94,
    author = {Joseph A. Shaw and Edward A. Fox},
    editor = {Donna K. Harman},
    title = {Combination of Multiple Searches},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {105--108},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/vt.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ShawF94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Okapi at TREC-3

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, Mike Gatford

Abstract

The emphasis in TREC-3 has been on further refinement of term-weighting functions; an investigation of run-time passage determination and searching; expansion of ad hoc queries by terms extracted from the top documents retrieved by a trial search; new methods for choosing query expansion terms after relevance feedback, now split into: methods of ranking terms prior to selection; subsequent selection procedures; and the development of a user interface and search procedure within the new TREC interactive search framework. The two successes have been in query expansion and in routing term selection. The modified term-weighting functions and passage retrieval have had small beneficial effects. For TREC-3 there were to be topics without the CONCEPTS fields, which had proved to be by far the most useful source of query terms. Query expansion, passage retrieval and the modified weighting functions, used together, have gone a long way towards compensating for this loss.

Bibtex
@inproceedings{DBLP:conf/trec/RobertsonWJHG94,
    author = {Stephen E. Robertson and Steve Walker and Susan Jones and Micheline Hancock{-}Beaulieu and Mike Gatford},
    editor = {Donna K. Harman},
    title = {Okapi at {TREC-3}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {109--126},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/city.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RobertsonWJHG94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Retrieval Systems for Large Document Collections

Alistair Moffat, Justin Zobel

Abstract

Information systems usually rank whole documents to identify which are answers. However, it may in some circumstances be more appropriate to rank fragments to identify which documents are answers. We consider methods of fragmenting and examine the retrieval effectiveness achieved by each method.

Bibtex
@inproceedings{DBLP:conf/trec/MoffatZ94,
    author = {Alistair Moffat and Justin Zobel},
    editor = {Donna K. Harman},
    title = {Information Retrieval Systems for Large Document Collections},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {85--94},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/moffat-zobel.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MoffatZ94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Daniel Knaus, Elke Mittendorf, Peter Schäuble

Abstract

A conventional text retrieval method is improved by two additional sources of evidence of relevance: first, by a passage retrieval method that is based on Hidden Markov Models and second, by a so-called link method which was originally developed for hypertext retrieval. The results show that the two additional sources of evidence improve a conventional vector space retrieval method both in a consistent and in a complementary way. The link method has the ability to retrieve relevant documents whose description vectors are not very similar to the query description vector whereas the passage retrieval method has the ability to improve an existing ordering of the documents.

Bibtex
@inproceedings{DBLP:conf/trec/KnausMS94,
    author = {Daniel Knaus and Elke Mittendorf and Peter Sch{\"{a}}uble},
    editor = {Donna K. Harman},
    title = {Improving a Basic Retrieval Method by Links and Passage Level Evidence},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {241--246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/ETHatTREC3.final.USletter.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KnausMS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results

Paul B. Kantor

Abstract

The performance of a simulated test of decision level data fusion in the routing (filtering) task of the Text Retrieval Conference is summarized and analyzed. The relatively poor results of an approach in which a specific fusion rule was selected for each retrieval task are analyzed in terms of a best possible fusion scenario based on a given scheme for quantizing the messages from the systems to be combined. The limitations of that scenario are in turn explored, and possible ways to improve upon it are outlined.

Bibtex
@inproceedings{DBLP:conf/trec/Kantor94,
    author = {Paul B. Kantor},
    editor = {Donna K. Harman},
    title = {Decision Level Data Fusion for Routing of Documents in the {TREC3} Context: {A} Base Case Analysis of Worst Case Results},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {319--332},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/kantor.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Kantor94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Searching For Meaning With The Help Of A PADRE

David Hawking, Paul B. Thistlewaite

Abstract

Full-text scanning offers significant advantages over other methods of document retrieval but is normally too slow for use on large collections. The Fujitsu AP1000 parallel distributed-memory machine has been used to reduce the time penalty for full-text scanning to acceptable interactive levels. The query language for the retrieval software (called PADRE) is described herein and differences between PADRE and traditional systems are highlighted. The advantages of the full-text scanning in broader retrieval contexts are outlined. TREC precision-recall results are discussed and timings are reported.

Bibtex
@inproceedings{DBLP:conf/trec/HawkingT94,
    author = {David Hawking and Paul B. Thistlewaite},
    editor = {Donna K. Harman},
    title = {Searching For Meaning With The Help Of {A} {PADRE}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {257--268},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/hawking.paper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HawkingT94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-3 Ad-Hoc, Routing Retrieval and Thresholding Experiments using PIRCS

K. L. Kwok, Laszlo Grunfeld, David D. Lewis

Abstract

The PIRCS retrieval system has been upgraded in TREC-3 to handle the full English collections of 2 GB in an efficient manner. For ad-hoc retrieval, we use recurrent spreading of activation in our network to implement query learning and expansion based on the best-ranked subdocuments of an initial retrieval. We also augment our standard retrieval algorithm with a soft-Boolean component. For routing, we use learning from signal-rich short documents or subdocument segments. For the optional thresholding experiment, we tried two approaches to transforming retrieval status values (RSV's) so that they could be used to partition documents into retrieved and nonretrieved sets. The first method normalizes RSV's using a query self-retrieval score. The second, which requires training data, uses logistic regression to convert RSV's into estimates of probability of relevance. Overall, our results are highly competitive with those of other participants.

Bibtex
@inproceedings{DBLP:conf/trec/KwokGL94,
    author = {K. L. Kwok and Laszlo Grunfeld and David D. Lewis},
    editor = {Donna K. Harman},
    title = {{TREC-3} Ad-Hoc, Routing Retrieval and Thresholding Experiments using {PIRCS}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {247--255},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/cuny.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KwokGL94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System

Paul Thompson, Howard R. Turtle, Bokyung Yang, James Flood

Abstract

The WIN retrieval engine is West's implementation of the inference network retrieval model Tur90. The inference net model ranks documents based on the combination of different evidence, e.g., text representations, such as words, phrases, or paragraphs, in a consistent probabilistic framework [TC91]. WIN is based on the same retrieval model as the INQUERY system that has been used in previous TREC competitions [BCC93, Cro93, CCB94]. The two retrieval engines have common roots but have evolved separately - WIN has focused on the retrieval of legal materials from large (>50 gigabyte) collections in a commercial online environment that supports both Boolean and natural language retrieval [Tur94]. For TREC-3 we decided to run an essentially unmodified version of WIN to see how well a state-of-the-art commercial system compares to state-of-the-art research systems. Some modifications to WIN were required to handle the TREC topics, which bear little resemblance to queries entered by online searchers. In general we used the same query formulation techniques used in the production WIN system with a preprocessor to select text from the topic in order to formulate a query. WIN was also used for routing experiments. Production versions of WIN do not provide routing or relevance feedback so we were less constrained by existing practice. However, we decided to limit ourselves to routing techniques that generated normal WIN queries. These routing queries could then be run using the standard search engine. In what follows, we will describe the configuration used for the experiments (Section 2) and the experiments that were conducted (Sections 3 and 4).

Bibtex
@inproceedings{DBLP:conf/trec/ThompsonTYF94,
    author = {Paul Thompson and Howard R. Turtle and Bokyung Yang and James Flood},
    editor = {Donna K. Harman},
    title = {{TREC-3} Ad Hoc Retrieval and Routing Experiments using the {WIN} System},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {211--218},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/west.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ThompsonTYF94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Interactive Document Retrieval Using TOPIC (A report on the TREC-3 experiment)

Richard M. Tong

Abstract

This paper contains a description of the experiments performed by Verity, Inc. as part of the Third Text Retrieval Conference (TREC-3). Verity participated as an Interactive Category A system and performed the full set of routing and adhoc experiments, submitting complete sets of results for both experiments. Section 2 of the paper contains a review of the TOPIC system itself and the data structures it produces. Section 3 of the paper contains a description of our experimental proce-dure. Section 4 contains an analysis of our official results. Section 5 contains some general comments on overall performance and a brief discussion of possible future directions. The Appendix contains additional details of our procedures, as well as an analysis of topic 122, the interactive 'focus topic'.

Bibtex
@inproceedings{DBLP:conf/trec/Tong94,
    author = {Richard M. Tong},
    editor = {Donna K. Harman},
    title = {Interactive Document Retrieval Using {TOPIC} {(A} report on the {TREC-3} experiment)},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {201--210},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/procPaper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Tong94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment

Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers

Abstract

In this paper, we investigate retrieval methods for loosely coupled IR systems. In such an environment, each IR system operates independently on its own document collection. For query processing, an agent takes the query and sends it to the different IR systems. From the answers received from theses servers, it forms a single ranking and sends it back to the user. In the work presented here, we examine different retrieval methods for performing routing and ad-hoc-queries in such an environment. For experiments, we use the TREC-3 collection and the SMART retrieval system.

Bibtex
@inproceedings{DBLP:conf/trec/WalczuchFPS94,
    author = {Nikolaus Walczuch and Norbert Fuhr and Michael Pollmann and Birgit Sievers},
    editor = {Donna K. Harman},
    title = {Routing and Ad-hoc Retrieval with the {TREC-3} Collection in a Distributed Loosely Federated Environment},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {135--144},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/dort.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/WalczuchFPS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Document Retrieval and Routing Using the INQUERY System

John Broglio, James P. Callan, W. Bruce Croft, Daniel W. Nachbar

Abstract

The INQUERY retrieval and routing system, which is based on the Bayesian inference net retrieval model, has been described in a number of papers 5, 4, 10, 11. In the TREC experiments this year, a number of new techniques were introduced for both the ad-hoc retrieval and routing runs. In addition, experiments with Spanish retrieval were carried out.

Bibtex
@inproceedings{DBLP:conf/trec/BroglioCCN94,
    author = {John Broglio and James P. Callan and W. Bruce Croft and Daniel W. Nachbar},
    editor = {Donna K. Harman},
    title = {Document Retrieval and Routing Using the {INQUERY} System},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {29--38},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/umass.revised.ps.gz},
    timestamp = {Wed, 07 Jul 2021 16:44:22 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BroglioCCN94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acquaintance: A Novel Vector-Space N-Gram Technique for Document Categorization

Stephen Huffman, Marc Damashek

Bibtex
@inproceedings{DBLP:conf/trec/HuffmanD94,
    author = {Stephen Huffman and Marc Damashek},
    editor = {Donna K. Harman},
    title = {Acquaintance: {A} Novel Vector-Space N-Gram Technique for Document Categorization},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {https://trec.nist.gov/pubs/trec3/t3_proceedings.html},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/HuffmanD94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Comparison of Fragmentation Schemes for Document Retrieval

Ross Wilkinson, Justin Zobel

Abstract

Information systems usually rank whole documents to identify which are answers. However, it may in some circumstances be more appropriate to rank fragments to identify which documents are answers. We consider methods of fragmenting and examine the retrieval effectiveness achieved by each method.

Bibtex
@inproceedings{DBLP:conf/trec/WilkinsonZ94,
    author = {Ross Wilkinson and Justin Zobel},
    editor = {Donna K. Harman},
    title = {Comparison of Fragmentation Schemes for Document Retrieval},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {81--84},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/trec3b.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/WilkinsonZ94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Xerox TREC-3 Report: Combining Exact and Fuzzy Predictors

Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst

Abstract

We employed different processing techniques for the ad hoc and routing tasks, although both explored the combination of exact and fuzzy predictors. For the routing task, we used a neural net classifier to estimate the probability of relevance of a new document to a given topic description. We experimented with a variety of document representations, finally settling on a hybrid feature set that combined a selected number of discriminating terms with a low dimensional local LSI 19 for generalization. In the ad hoc task, we applied some of our content analysis techniques, in particular, automatic thesaurus construction and automatic topical segmentation of full-length texts. Working from the hypothesis that exact term match can be too restrictive, we used thesaurus vectors [13] to compute context vectors for full texts and text segments. We also used thesaurus vectors to decompose the terms contained in the topic descriptions into sets of query factors, where each query factor is intended to represent a semantically distinct component of the topic description. These factors constrain document search by imposing Boolean constraints as will be described below. We used TextTiling 6 to partition long documents into semantically motivated multi-paragraph segments called tiles. We adopted the logistic regression methodology of Cooper et al. [2] and Fuhr et al. [5] to model the probability of relevance given measured attributes of a query-document pair. This allowed us to combine standard predictors (such as number of match terms) with various assessments of tile match, query tactor score, and other predictors. For both tasks we first performed standard preprocessing (document parsing, tokenization, stop-list term removal) using the TDB system 3. Our terms consisted of single words and two-word phrases that occur over five times in the corpus (where phrase is defined as an adjacent word pair, not including stop words). This process produced over 2.5 million terms, which were further processed as described below. In the remainder of this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.

Bibtex
@inproceedings{DBLP:conf/trec/SchutzePH94,
    author = {Hinrich Sch{\"{u}}tze and Jan O. Pedersen and Marti A. Hearst},
    editor = {Donna K. Harman},
    title = {Xerox {TREC-3} Report: Combining Exact and Fuzzy Predictors},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {21--28},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/xerox.ps.gz},
    timestamp = {Thu, 08 Oct 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/SchutzePH94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Retrieval System for TREC3

Kenji Satoh, Akitoshi Okumura, Kiyoshi Yamabana

Abstract

This is our first participation with TREC. Our team researches natural language processing, and we have developed English-Japanese and Japanese-English machine translation system. (The code name of machine translation system is VENUS.) We are now researching a new natural language processing environment, including information retrieval and text understanding. (The environment name is VIRTUE: VENUS for Information Retrieval and Text Understanding.)

Bibtex
@inproceedings{DBLP:conf/trec/SatohOY94,
    author = {Kenji Satoh and Akitoshi Okumura and Kiyoshi Yamabana},
    editor = {Donna K. Harman},
    title = {Information Retrieval System for {TREC3}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {311--318},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/virtue.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/SatohOY94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Routing

Automatic Query Expansion Using SMART: TREC 3

Chris Buckley, Gerard Salton, James Allan, Amit Singhal

Abstract

The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments. Our major focus is massive query expansion: adding from 300 to 530 terms to each query. These terms come from known relevant documents in the case of routing, and from just the top retrieved documents in the case of ad-hoc and Spanish. This approach improves effectiveness from 7% to 25% in the various experiments. Other ad-hoc work extends our investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document which matches the query. Using an overlapping text window definition of 'local', we achieve a 16% improvement.

Bibtex
@inproceedings{DBLP:conf/trec/BuckleySAS94,
    author = {Chris Buckley and Gerard Salton and James Allan and Amit Singhal},
    editor = {Donna K. Harman},
    title = {Automatic Query Expansion Using {SMART:} {TREC} 3},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {69--80},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/cornell.new.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BuckleySAS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Experiments in the Probabilistic Retrieval of Full Text Documents

William S. Cooper, Aitao Chen, Fredric C. Gey

Abstract

The experiments described here constitute a continuation of a research program whose object is to find probabilistically sound, yet simple and powerful, ways of combining search clues in full-text retrieval. The methodology investigated for ad hoc retrieval is that of logistic regression, in which the retrieval rule takes the form of a regression equation fitted to learning data. Most of the variables used in the regression take the form of means rather than the more customary sums, and it is argued that this is logically preferable. Radical manual reformulations of the topics were tried out and found to boost retrieval effectiveness. For routing retrieval, an approach based on the Assumption of Linked Depen-dence, involving the extraction of relevance-associated stems from feedback documents, is investi-gated. One characteristic of this approach is that only a very minimal use is made of the original topic formulation.

Bibtex
@inproceedings{DBLP:conf/trec/CooperCG94,
    author = {William S. Cooper and Aitao Chen and Fredric C. Gey},
    editor = {Donna K. Harman},
    title = {Experiments in the Probabilistic Retrieval of Full Text Documents},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {127--134},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/berkeley.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CooperCG94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Using Database Schemes to Detect Relevant Information

James R. Driscoll, G. Theis, G. Billings

Bibtex
@inproceedings{DBLP:conf/trec/DriscollTB94,
    author = {James R. Driscoll and G. Theis and G. Billings},
    editor = {Donna K. Harman},
    title = {Using Database Schemes to Detect Relevant Information},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {373--384},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {https://trec.nist.gov/pubs/trec3/t3_proceedings.html},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/DriscollTB94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Latent Semantic Indexing (LSI): TREC-3 Report

Susan T. Dumais

Abstract

This paper reports on recent developments of the Latent Semantic Indexing (LSI) retrieval method for TREC-3. LSI uses a reduced-dimension vector space to represent words and documents. An important aspect of this representation is that the association between terms is automatically captured, explicitly represented, and used to improve retrieval. We used LSI for both TREC-3 routing and adhoc tasks. For the routing tasks an LS space was constructed using the training documents. We compared profiles constructed using just the topic words (no training) with profiles constructed using the average of relevant documents (no use of the topic words). Not surprisingly, the centroid of the relevant documents was 30% better than the topic words. This simple feedback method was quite good compared to the routing performance of other systems. Various combinations of information from the topic words and relevant documents provide small additional improvements in performance. For the adhoc task we compared LSI to keyword vector matching (i.e. using no dimension reduction). Small advantages were obtained for LSI even with the long TREC topic statements.

Bibtex
@inproceedings{DBLP:conf/trec/Dumais94,
    author = {Susan T. Dumais},
    editor = {Donna K. Harman},
    title = {Latent Semantic Indexing {(LSI):} {TREC-3} Report},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {219--230},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/lsi.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Dumais94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Research in Automatic Profile Creation and Relevance Ranking with LMDS

Julian A. Yochum

Abstract

This paper describes the development of a prototype system to generate routing profiles automatically from sets of relevant documents provided by a user, and to assign relevance scores to the documents selected by these profiles. The prototype was developed with the Logicon Message Dissemination System (LMDS) for participation in the Third Text REtrieval Conference (TREC-3). Each generated profile contains two sets of terms: a very small set to select documents, and a much larger set to assign a relevance score to each document selected. The profile generator chooses each term and assigns a weight to it, based on its frequency of occurrence in the set of documents provided by the user, and on its frequency of occurrence in a large representative corpus of documents. The LMDS search engine uses the resulting profiles to select documents, and then passes the documents to the scoring prototype for ranking. The score assigned is a function of the weights of all profile terms found in the document. Performance figures and TREC-3 results are included.

Bibtex
@inproceedings{DBLP:conf/trec/Yochum94,
    author = {Julian A. Yochum},
    editor = {Donna K. Harman},
    title = {Research in Automatic Profile Creation and Relevance Ranking with {LMDS}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {289},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/logicon.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Yochum94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Natural Language Information Retrieval: TREC-3 Report

Tomek Strzalkowski, Jose Perez Carballo, Mihnea Marinescu

Abstract

In this paper we report on the recent developments in NYU's natural language information retrieval system, especially as related to the 3rd Text Retrieval Conference (TREC-3). The main characteristic of this system is the use of advanced natural language processing to enhance the effectiveness of term-based document retrieval. The system is designed around a traditional statistical backbone consisting of the indexer module, which builds inverted index files from pre-processed documents, and a retrieval engine which searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the documents in order to extract content-carrying terms, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. For the present TREC-3 effort, the total of 3.3 GBytes of text articles have been processed (Tipster disks 1 through 3), including material from the Wall Street Journal, the Associated Press newswire, the Federal Register, Ziff Communications's Computer Library, Department of Energy abstracts, U.S. Patents and the San Jose Mercury News, totaling more than 500 million words of English. Since the TREC-2 conference, many components of the system have been redesigned to facilitate its scalability to deal with ever increasing amounts of data. In particular, a randomized index-splitting mechanism has been installed which allows the system to create a number of smaller indexes that can be independently and efficiently searched.

Bibtex
@inproceedings{DBLP:conf/trec/StrzalkowskiCM94,
    author = {Tomek Strzalkowski and Jose Perez Carballo and Mihnea Marinescu},
    editor = {Donna K. Harman},
    title = {Natural Language Information Retrieval: {TREC-3} Report},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {39--54},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/nyu\_trec3\_paper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/StrzalkowskiCM94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Okapi at TREC-3

Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, Mike Gatford

Abstract

The emphasis in TREC-3 has been on further refinement of term-weighting functions; an investigation of run-time passage determination and searching; expansion of ad hoc queries by terms extracted from the top documents retrieved by a trial search; new methods for choosing query expansion terms after relevance feedback, now split into: methods of ranking terms prior to selection; subsequent selection procedures; and the development of a user interface and search procedure within the new TREC interactive search framework. The two successes have been in query expansion and in routing term selection. The modified term-weighting functions and passage retrieval have had small beneficial effects. For TREC-3 there were to be topics without the CONCEPTS fields, which had proved to be by far the most useful source of query terms. Query expansion, passage retrieval and the modified weighting functions, used together, have gone a long way towards compensating for this loss.

Bibtex
@inproceedings{DBLP:conf/trec/RobertsonWJHG94,
    author = {Stephen E. Robertson and Steve Walker and Susan Jones and Micheline Hancock{-}Beaulieu and Mike Gatford},
    editor = {Donna K. Harman},
    title = {Okapi at {TREC-3}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {109--126},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/city.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RobertsonWJHG94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Daniel Knaus, Elke Mittendorf, Peter Schäuble

Abstract

A conventional text retrieval method is improved by two additional sources of evidence of relevance: first, by a passage retrieval method that is based on Hidden Markov Models and second, by a so-called link method which was originally developed for hypertext retrieval. The results show that the two additional sources of evidence improve a conventional vector space retrieval method both in a consistent and in a complementary way. The link method has the ability to retrieve relevant documents whose description vectors are not very similar to the query description vector whereas the passage retrieval method has the ability to improve an existing ordering of the documents.

Bibtex
@inproceedings{DBLP:conf/trec/KnausMS94,
    author = {Daniel Knaus and Elke Mittendorf and Peter Sch{\"{a}}uble},
    editor = {Donna K. Harman},
    title = {Improving a Basic Retrieval Method by Links and Passage Level Evidence},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {241--246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/ETHatTREC3.final.USletter.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KnausMS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Decision Level Data Fusion for Routing of Documents in the TREC3 Context: A Base Case Analysis of Worst Case Results

Paul B. Kantor

Abstract

The performance of a simulated test of decision level data fusion in the routing (filtering) task of the Text Retrieval Conference is summarized and analyzed. The relatively poor results of an approach in which a specific fusion rule was selected for each retrieval task are analyzed in terms of a best possible fusion scenario based on a given scheme for quantizing the messages from the systems to be combined. The limitations of that scenario are in turn explored, and possible ways to improve upon it are outlined.

Bibtex
@inproceedings{DBLP:conf/trec/Kantor94,
    author = {Paul B. Kantor},
    editor = {Donna K. Harman},
    title = {Decision Level Data Fusion for Routing of Documents in the {TREC3} Context: {A} Base Case Analysis of Worst Case Results},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {319--332},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/kantor.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Kantor94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-3 Ad-Hoc, Routing Retrieval and Thresholding Experiments using PIRCS

K. L. Kwok, Laszlo Grunfeld, David D. Lewis

Abstract

The PIRCS retrieval system has been upgraded in TREC-3 to handle the full English collections of 2 GB in an efficient manner. For ad-hoc retrieval, we use recurrent spreading of activation in our network to implement query learning and expansion based on the best-ranked subdocuments of an initial retrieval. We also augment our standard retrieval algorithm with a soft-Boolean component. For routing, we use learning from signal-rich short documents or subdocument segments. For the optional thresholding experiment, we tried two approaches to transforming retrieval status values (RSV's) so that they could be used to partition documents into retrieved and nonretrieved sets. The first method normalizes RSV's using a query self-retrieval score. The second, which requires training data, uses logistic regression to convert RSV's into estimates of probability of relevance. Overall, our results are highly competitive with those of other participants.

Bibtex
@inproceedings{DBLP:conf/trec/KwokGL94,
    author = {K. L. Kwok and Laszlo Grunfeld and David D. Lewis},
    editor = {Donna K. Harman},
    title = {{TREC-3} Ad-Hoc, Routing Retrieval and Thresholding Experiments using {PIRCS}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {247--255},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/cuny.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KwokGL94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC-3 Ad Hoc Retrieval and Routing Experiments using the WIN System

Paul Thompson, Howard R. Turtle, Bokyung Yang, James Flood

Abstract

The WIN retrieval engine is West's implementation of the inference network retrieval model Tur90. The inference net model ranks documents based on the combination of different evidence, e.g., text representations, such as words, phrases, or paragraphs, in a consistent probabilistic framework [TC91]. WIN is based on the same retrieval model as the INQUERY system that has been used in previous TREC competitions [BCC93, Cro93, CCB94]. The two retrieval engines have common roots but have evolved separately - WIN has focused on the retrieval of legal materials from large (>50 gigabyte) collections in a commercial online environment that supports both Boolean and natural language retrieval [Tur94]. For TREC-3 we decided to run an essentially unmodified version of WIN to see how well a state-of-the-art commercial system compares to state-of-the-art research systems. Some modifications to WIN were required to handle the TREC topics, which bear little resemblance to queries entered by online searchers. In general we used the same query formulation techniques used in the production WIN system with a preprocessor to select text from the topic in order to formulate a query. WIN was also used for routing experiments. Production versions of WIN do not provide routing or relevance feedback so we were less constrained by existing practice. However, we decided to limit ourselves to routing techniques that generated normal WIN queries. These routing queries could then be run using the standard search engine. In what follows, we will describe the configuration used for the experiments (Section 2) and the experiments that were conducted (Sections 3 and 4).

Bibtex
@inproceedings{DBLP:conf/trec/ThompsonTYF94,
    author = {Paul Thompson and Howard R. Turtle and Bokyung Yang and James Flood},
    editor = {Donna K. Harman},
    title = {{TREC-3} Ad Hoc Retrieval and Routing Experiments using the {WIN} System},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {211--218},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/west.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ThompsonTYF94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Interactive Document Retrieval Using TOPIC (A report on the TREC-3 experiment)

Richard M. Tong

Abstract

This paper contains a description of the experiments performed by Verity, Inc. as part of the Third Text Retrieval Conference (TREC-3). Verity participated as an Interactive Category A system and performed the full set of routing and adhoc experiments, submitting complete sets of results for both experiments. Section 2 of the paper contains a review of the TOPIC system itself and the data structures it produces. Section 3 of the paper contains a description of our experimental proce-dure. Section 4 contains an analysis of our official results. Section 5 contains some general comments on overall performance and a brief discussion of possible future directions. The Appendix contains additional details of our procedures, as well as an analysis of topic 122, the interactive 'focus topic'.

Bibtex
@inproceedings{DBLP:conf/trec/Tong94,
    author = {Richard M. Tong},
    editor = {Donna K. Harman},
    title = {Interactive Document Retrieval Using {TOPIC} {(A} report on the {TREC-3} experiment)},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {201--210},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/procPaper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Tong94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment

Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers

Abstract

In this paper, we investigate retrieval methods for loosely coupled IR systems. In such an environment, each IR system operates independently on its own document collection. For query processing, an agent takes the query and sends it to the different IR systems. From the answers received from theses servers, it forms a single ranking and sends it back to the user. In the work presented here, we examine different retrieval methods for performing routing and ad-hoc-queries in such an environment. For experiments, we use the TREC-3 collection and the SMART retrieval system.

Bibtex
@inproceedings{DBLP:conf/trec/WalczuchFPS94,
    author = {Nikolaus Walczuch and Norbert Fuhr and Michael Pollmann and Birgit Sievers},
    editor = {Donna K. Harman},
    title = {Routing and Ad-hoc Retrieval with the {TREC-3} Collection in a Distributed Loosely Federated Environment},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {135--144},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/dort.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/WalczuchFPS94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Document Retrieval and Routing Using the INQUERY System

John Broglio, James P. Callan, W. Bruce Croft, Daniel W. Nachbar

Abstract

The INQUERY retrieval and routing system, which is based on the Bayesian inference net retrieval model, has been described in a number of papers 5, 4, 10, 11]. In the TREC experiments this year, a number of new techniques were introduced for both the ad-hoc retrieval and routing runs. In addition, experiments with Spanish retrieval were carried out.

Bibtex
@inproceedings{DBLP:conf/trec/BroglioCCN94,
    author = {John Broglio and James P. Callan and W. Bruce Croft and Daniel W. Nachbar},
    editor = {Donna K. Harman},
    title = {Document Retrieval and Routing Using the {INQUERY} System},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {29--38},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/umass.revised.ps.gz},
    timestamp = {Wed, 07 Jul 2021 16:44:22 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BroglioCCN94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acquaintance: A Novel Vector-Space N-Gram Technique for Document Categorization

Stephen Huffman, Marc Damashek

Bibtex
@inproceedings{DBLP:conf/trec/HuffmanD94,
    author = {Stephen Huffman and Marc Damashek},
    editor = {Donna K. Harman},
    title = {Acquaintance: {A} Novel Vector-Space N-Gram Technique for Document Categorization},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {https://trec.nist.gov/pubs/trec3/t3_proceedings.html},
    timestamp = {Tue, 07 Apr 2015 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/HuffmanD94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Xerox TREC-3 Report: Combining Exact and Fuzzy Predictors

Hinrich Schütze, Jan O. Pedersen, Marti A. Hearst

Abstract

We employed different processing techniques for the ad hoc and routing tasks, although both explored the combination of exact and fuzzy predictors. For the routing task, we used a neural net classifier to estimate the probability of relevance of a new document to a given topic description. We experimented with a variety of document representations, finally settling on a hybrid feature set that combined a selected number of discriminating terms with a low dimensional local LSI 19 for generalization. In the ad hoc task, we applied some of our content analysis techniques, in particular, automatic thesaurus construction and automatic topical segmentation of full-length texts. Working from the hypothesis that exact term match can be too restrictive, we used thesaurus vectors [13] to compute context vectors for full texts and text segments. We also used thesaurus vectors to decompose the terms contained in the topic descriptions into sets of query factors, where each query factor is intended to represent a semantically distinct component of the topic description. These factors constrain document search by imposing Boolean constraints as will be described below. We used TextTiling 6 to partition long documents into semantically motivated multi-paragraph segments called tiles. We adopted the logistic regression methodology of Cooper et al. [2] and Fuhr et al. [5] to model the probability of relevance given measured attributes of a query-document pair. This allowed us to combine standard predictors (such as number of match terms) with various assessments of tile match, query tactor score, and other predictors. For both tasks we first performed standard preprocessing (document parsing, tokenization, stop-list term removal) using the TDB system 3. Our terms consisted of single words and two-word phrases that occur over five times in the corpus (where phrase is defined as an adjacent word pair, not including stop words). This process produced over 2.5 million terms, which were further processed as described below. In the remainder of this report, Section 2 describes our routing experiments, Section 3 describes our ad hoc experiments, and Section 4 discusses our results and possible future experiments.

Bibtex
@inproceedings{DBLP:conf/trec/SchutzePH94,
    author = {Hinrich Sch{\"{u}}tze and Jan O. Pedersen and Marti A. Hearst},
    editor = {Donna K. Harman},
    title = {Xerox {TREC-3} Report: Combining Exact and Fuzzy Predictors},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {21--28},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/xerox.ps.gz},
    timestamp = {Thu, 08 Oct 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/SchutzePH94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Information Retrieval System for TREC3

Kenji Satoh, Akitoshi Okumura, Kiyoshi Yamabana

Abstract

This is our first participation with TREC. Our team researches natural language processing, and we have developed English-Japanese and Japanese-English machine translation system. (The code name of machine translation system is VENUS.) We are now researching a new natural language processing environment, including information retrieval and text understanding. (The environment name is VIRTUE: VENUS for Information Retrieval and Text Understanding.)

Bibtex
@inproceedings{DBLP:conf/trec/SatohOY94,
    author = {Kenji Satoh and Akitoshi Okumura and Kiyoshi Yamabana},
    editor = {Donna K. Harman},
    title = {Information Retrieval System for {TREC3}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {311--318},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/virtue.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/SatohOY94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

New Tools and Old Habits: The Interactive Searching Behavior of Expert Online Searches using INQUERY

Jürgen Koenemann, Richard Quatrain, Colleen Cool, Nicholas J. Belkin

Abstract

We present data that describe the interactive searching behavior of ten searchers using the INQUERY retrieval engine in the context of the TREC-3 routing task. We discuss how these searchers with a strong background in the use of traditional online retrieval mechanisms adapted, after very limited training, to the use of a best-match, ranked-output, full-text retrieval mechanism.

Bibtex
@inproceedings{DBLP:conf/trec/KoenemannQCB94,
    author = {J{\"{u}}rgen Koenemann and Richard Quatrain and Colleen Cool and Nicholas J. Belkin},
    editor = {Donna K. Harman},
    title = {New Tools and Old Habits: The Interactive Searching Behavior of Expert Online Searches using {INQUERY}},
    booktitle = {Proceedings of The Third Text REtrieval Conference, {TREC} 1994, Gaithersburg, Maryland, USA, November 2-4, 1994},
    series = {{NIST} Special Publication},
    volume = {500-225},
    pages = {145--178},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1994},
    url = {http://trec.nist.gov/pubs/trec3/papers/rutgers\_interact\_paper.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KoenemannQCB94.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}