Skip to content

Proceedings - Confusion 1995

New Retrieval Approaches Using SMART: TREC 4

Chris Buckley, Amit Singhal, Mandar Mitra

Abstract

The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 4, performing runs in the routing, ad-hoc, confused text, interactive, and foreign language environments.

Bibtex
@inproceedings{DBLP:conf/trec/BuckleySM95,
    author = {Chris Buckley and Amit Singhal and Mandar Mitra},
    editor = {Donna K. Harman},
    title = {New Retrieval Approaches Using {SMART:} {TREC} 4},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/Cornell\_trec4.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BuckleySM95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acquaintance: Language-Independent Document Categorization by N-Grams

Stephen Huffman

Abstract

Acquaintance is the name of a novel vector-space n-gram technique for categorizing documents. The technique is completely language-independent, highly garble-resistant, and computationally simple. An unoptimized version of the algorithm was used to process the TREC database in a very short time.

Bibtex
@inproceedings{DBLP:conf/trec/Huffman95,
    author = {Stephen Huffman},
    editor = {Donna K. Harman},
    title = {Acquaintance: Language-Independent Document Categorization by N-Grams},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/nsa.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Huffman95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Two Experiments on Retrieval With Corrupted Data and Clean Queries in the TREC-4 Adhoc Task Environment: Data Fusion and Pattern Scanning

Kwong Bor Ng, Paul B. Kantor

Abstract

We report on several experiments in using data fusion to improve information retrieval, and in approximate text and 5-gram mathcing methods for retrieval of corrupted text, in the TREC context.

Bibtex
@inproceedings{DBLP:conf/trec/NgK95,
    author = {Kwong Bor Ng and Paul B. Kantor},
    editor = {Donna K. Harman},
    title = {Two Experiments on Retrieval With Corrupted Data and Clean Queries in the {TREC-4} Adhoc Task Environment: Data Fusion and Pattern Scanning},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/kantor.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/NgK95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Improving Accuracy and Run-Time Performance for TREC-4

David A. Grossman, David O. Holmes, Ophir Frieder, Matthew D. Nguyen, Christopher E. Kingsbury

Abstract

For TREC-4, we enhanced our existing prototype that implements relevance ranking using the AT&T DBC-1012 Model 4 parallel database machine to support the entire document collec-tion. Additionally, we developed a special purpose IR prototype to test a new index compression algorithm and to provide performance comparisons to the relational approach. We submitted official results for both automatic and manual adhoc queries for the entire 2GB English collection and the provided Spanish collection. Additionally, we submitted results using n-grams to process the corrupted data. In addition to implementing the vector-space model, we experimented with query reduction based on term frequency. Query reduction was shown to result in dramatically improved run-time performance and, in many cases, resulted in little or no degradation of precision/ recall.

Bibtex
@inproceedings{DBLP:conf/trec/GrossmanHFNK95,
    author = {David A. Grossman and David O. Holmes and Ophir Frieder and Matthew D. Nguyen and Christopher E. Kingsbury},
    editor = {Donna K. Harman},
    title = {Improving Accuracy and Run-Time Performance for {TREC-4}},
    booktitle = {Proceedings of The Fourth Text REtrieval Conference, {TREC} 1995, Gaithersburg, Maryland, USA, November 1-3, 1995},
    series = {{NIST} Special Publication},
    volume = {500-236},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1995},
    url = {http://trec.nist.gov/pubs/trec4/papers/gmu.ps.gz},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GrossmanHFNK95.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}