Proceedings - Adhoc 1999¶

IRIS at TREC-8¶

Kiduk Yang, Kelly Maglaughlin

Participant: UNC
Paper: 10.6028/NIST.SP.500-246.interactive-UNC
Runs: unc8al32 | unc8al42 | unc8al52

Abstract

We tested two relevance feedback models, an adaptive linear model and a probabilistic model, using massive feedback query expansion in TREC-S (Sumner & Shaw, 1997), experimented with a three-valued scale of relevance and reduced feedback query expansion in TREC-6 (Sumner, Yang, Akers & Shaw, 1998), and examined the effectiveness of relevance feedback using a subcollection and the effect of system features in an interactive retrieval system called IRIS (Information Retrieval Interactive System') in TREC-7 (Yang, Maglaughlin, Mehol & Sumner, 1999). In TREC-8, we continued our exploration of relevance feedback approaches. Based on the result of our TREC-7 interactive experiment, which suggested relevance feedback using user-selected passages to be an effective alternative to conventional document feedback, our TREC-8 interactive experiment compared a passage feedback system and a document feedback system that were identical in all aspects except for the feedback mechanism. For the TREC-8 ad-hoc task, we merged results of pseudo-relevance feedback to subcollections as in TREC-7. Our results were consistent with that of TREC-7. The results of passage feedback, whose system log showed high level of searcher intervention, was superior to the document feedback results. As in TREC-7, our ad-hoc results showed high precision in top few documents, but performed poorly overall compared to results using the collection as a whole.

Bibtex

@inproceedings{DBLP:conf/trec/YangM99,
    author = {Kiduk Yang and Kelly Maglaughlin},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{IRIS} at {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/unc\_tr8final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/YangM99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.interactive-UNC}
}

The Mirror DBMS at TREC-8¶

Arjen P. de Vries, Djoerd Hiemstra

Participant: utwente
Paper: 10.6028/NIST.SP.500-246.adhoc-utwente
Runs: UT810 | UT800 | UT803 | UT803b | UT813

Abstract

The database group at University of Twente participates in TREC-8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking.

Bibtex

@inproceedings{DBLP:conf/trec/VriesH99,
    author = {Arjen P. de Vries and Djoerd Hiemstra},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The Mirror {DBMS} at {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ut.trec8.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/VriesH99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-utwente}
}

NTT DATA: Overview of system approach at TREC-8 ad-hoc and question answering¶

Toru Takaki

Participant: ntt
Paper: 10.6028/NIST.SP.500-246.qa-ntt
Runs: nttd8ale | nttd8alx | nttd8al | nttd8ame | nttd8am

Abstract

In TREC-8, NTT Data Corporation participated in the ad-hoc task and question answering track. In this paper, we describe our system approach and discuss the results. The summary of each task of our approach is shown below.

Bibtex

@inproceedings{DBLP:conf/trec/Takaki99,
    author = {Toru Takaki},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{NTT} {DATA:} Overview of system approach at {TREC-8} ad-hoc and question answering},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/trec8-nttdata.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Takaki99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.qa-ntt}
}

Natural Language Information Retrieval: TREC-8 Report¶

Tomek Strzalkowski, Jose Perez Carballo, Jussi Karlgren, Anette Hulth, Pasi Tapanainen, Timo Lahtinen

Participant: ge
Paper: 10.6028/NIST.SP.500-246.adhoc-ge
Runs: 8manexT3D1N0 | GE8ATDN1 | GE8ATDN2 | GE8ATD3 | GE8MTD2

Abstract

This report describes the adhoc experiments performed by the GE/Rutgers/SICS/SU/Conexor team in the context of TREC-8. The research efforts went in four directions: 1. As in previous years, we performed a full linguistic analysis of the entire corpus, and used the results of the analysis to provide index terms on a higher level of abstraction than can be provided by stems alone. 2. We made use of two different query expansion techniques, one automatic and one manual, both developed for TREC-8. 3. The various analysis models were combined using a stream model architecture, where each stream represents an alternative text indexing method, and the stream's various overlapping knowledge was merged using a new merging algorithm derived from first principles. 4. The entire text was analyzed for various stylistic items. Due to the distributed approach, this years' research efforts partly canceled out each other. New experiments in every step of the process did not result in an overwhelming overall result. We are able to determine that the manual query expansion technique developed at General Electric performed very well.

Bibtex

@inproceedings{DBLP:conf/trec/StrzalkowskiCKHTL99,
    author = {Tomek Strzalkowski and Jose Perez Carballo and Jussi Karlgren and Anette Hulth and Pasi Tapanainen and Timo Lahtinen},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Natural Language Information Retrieval: {TREC-8} Report},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ge8adhoc2.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/StrzalkowskiCKHTL99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-ge}
}

SCAI TREC-8 Experiments¶

Dong-Ho Shin, Yu-Hwan Kim, Sun Kim, Jae-Hong Eom, Hyung-Joo Shin, Byoung-Tak Zhang

Participant: seoul
Paper: 10.6028/NIST.SP.500-246.qa-seoul
Runs: Scai8Adhoc

Abstract

This working note reports our experiences with TREC-8 on four tracks: Ad Hoc, Filtering, Web, and QA. The Ad Hoc retrieval engine, SCAIR, has been used for the Web and QA experiments, and the filtering experiments were based on it's own engine. As a second entry to TREC, we focused this year on exploring possibilities of applying machine learning techniques to TREC tasks. The Ad Hoc track employed a cluster-based retrieval method where the scoring function used cluster information extracted from a collection of precompiled documents. Filtering was based on naive Bayes learning supported by an EM algorithm. In the Web track, we compared the performance of using link information to that of not using the information. In the QA track, some passage extraction techniques have been tested using the baseline SCAIR retrieval engine.

Bibtex

@inproceedings{DBLP:conf/trec/ShinKKESZ99,
    author = {Dong{-}Ho Shin and Yu{-}Hwan Kim and Sun Kim and Jae{-}Hong Eom and Hyung{-}Joo Shin and Byoung{-}Tak Zhang},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{SCAI} {TREC-8} Experiments},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ScaiTrec8.ps},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ShinKKESZ99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.qa-seoul}
}

Novel Query Expansion Technique using Apriori Algorithm¶

Arnon Rungsawang, Athichart Tangpong, Pawat Laohawee, Tawa Khampachua

Participant: kasetsart
Paper: 10.6028/NIST.SP.500-246.adhoc-kasetsart
Runs: kuadhoc

Abstract

One problem in query reformulation process is to find an optimal set of terms to add to the old query. In our TREC experiments this year, we propose to use the association rule discovery (especially apriori algorithm) to find good candidate terms to enhance the query. These candidate terms are automatically derived from collection, added to the original query to build a new one. Experiments conducted on a subset of TREC collections gives quite promising results. We achieve a 19% improvement with old TREC7 adhoc queries.

Bibtex

@inproceedings{DBLP:conf/trec/RungsawangTLK99,
    author = {Arnon Rungsawang and Athichart Tangpong and Pawat Laohawee and Tawa Khampachua},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Novel Query Expansion Technique using Apriori Algorithm},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/trec8-ku.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RungsawangTLK99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-kasetsart}
}

Stefan M. Rüger

Participant: imperial
Paper: 10.6028/NIST.SP.500-246.adhoc-imperial
Runs: ic99dafb

Abstract

Our experiments for the ad hoc task of TREC & were centered around the question how to create an automatic query feedback from the documents returned by an initial query.

Bibtex

@inproceedings{DBLP:conf/trec/Ruger99,
    author = {Stefan M. R{\"{u}}ger},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Automatic Query Feedback using Related Words},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ic99dafb.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Ruger99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-imperial}
}

Retrieval Performance and Visual Dispersion of Query Sets¶

Mark E. Rorvig

Participant: ntexas
Paper: 10.6028/NIST.SP.500-246.adhoc-ntexas
Runs: 1

Abstract

In the course of eight TREC Conferences, retrieval performance of all systems started high and then declined. This was especially true for conference 5. Only in conferences 7 and 8 have performance levels reached those initially achieved. In this paper, scaling of the corpus of 450 TREC topics is performed. It is observed that as the visual dispersion of a topic set increases, the level of retrieval performance across systems declines for that set. Conversely, as the visual dispersion of topics decreases, system performance rises. In common elements of conferences 2, 5, and 8, this relationship appears to hold despite increases in the number of participating systems in TREC. It is proposed that visual dispersion measures should be used to describe topic set difficulty in addition to measures such as 'hardness'.

Bibtex

@inproceedings{DBLP:conf/trec/Rorvig99,
    author = {Mark E. Rorvig},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Retrieval Performance and Visual Dispersion of Query Sets},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/unt\_rorvig.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Rorvig99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-ntexas}
}

Okapi/Keenbow at TREC-8¶

Stephen E. Robertson, Steve Walker

Participant: microsoft
Paper: 10.6028/NIST.SP.500-246.web-microsoft
Runs: ok8amxc | ok8asxc | ok8alx

Abstract

Automatic ad hoc and web track: Three ad hoc runs were submitted: long (title, description and narrative), medium (title and description) and short (title only). 'Blind' expansion was used for all runs. The queries from the medium ad hoc run were reused for the small web track submission. Most of the negative expressions were removed from the narrative field of the topic statements, and a new expansion term selection procedure was tried. Adaptive filtering: Methods were similar to those we used in TREC-7. Six runs were submitted. VLC track: Two unexpanded ad hoc runs were submitted.

Bibtex

@inproceedings{DBLP:conf/trec/RobertsonW99,
    author = {Stephen E. Robertson and Steve Walker},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Okapi/Keenbow at {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/okapi.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RobertsonW99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.web-microsoft}
}

Structuring and expanding queries in the probabilistic model¶

Yasushi Ogawa, Hiroko Mano, Masumi Narita, Sakiko Honma

Participant: ricoh
Paper: 10.6028/NIST.SP.500-246.adhoc-ricoh
Runs: ric8tpn | ric8dpn | ric8dpx | ric8dnx | ric8tpx

Abstract

This is our first participation in TREC and five runs were submitted for the ad-hoc main task. Our system is based on our Japanese text retrieval system 4, to which English tok-enizer/stemmer has been added to process English text. Our indexing system stores term positions, thus providing proximity-based search, in which the user can specify the distance between query terms. What our system does is outlined as follows: 1. Query construction The query constructor accepts each topic, extracts words in each of the appropriate fields and constructs a query to be supplied to the ranking system. 2. Initial retrieval The constucted query is fed into the ranking system, which then assigns term weights to query terms, scores each document and turns up a set of top-ranking documents assumed to be relevant to the topic (pseudo-relevant documents). 3. Query expansion Based on the feedback from the pseudo-relevant documents, the query expander collects and ranks the words in the pseudo-relevant documents and the words ranked the highest are added to the original query, with the words already in the query re-assigned new term weights. 4. Final retrieval The ranking system performs final retrieval using the modified query. In what follows, we explain what is done in each of the steps in more detail.

Bibtex

@inproceedings{DBLP:conf/trec/OgawaMNH99,
    author = {Yasushi Ogawa and Hiroko Mano and Masumi Narita and Sakiko Honma},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Structuring and expanding queries in the probabilistic model},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ricoh\_notebook.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/OgawaMNH99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-ricoh}
}

A Maximum Likelihood Ratio Information Retrieval Model¶

Kenney Ng

Participant: MIT
Paper: 10.6028/NIST.SP.500-246.adhoc-MIT
Runs: MITSLStdn | MITSLStd

Abstract

In this paper we present a novel probabilistic information retrieval model that scores documents based on the relative change in the document likelihoods, expressed as the ratio of the conditional probability of the document given the query and the prior probability of the document before the query is specified. The document likelihoods are computed using statistical language modeling techniques and the model parameters are estimated automatically and dynamically for each query to optimize well-specified (maximum likelihood) objective functions. We derive the basic retrieval model, describe the details of the model, and present some extensions to the model including a method to perform automatic feedback. Development experiments are performed using the TREC-6 ad hoc text retrieval task and performance is measured using the TREC-7 ad hoc task. Official evaluation results on the 1999 TREC-8 ad hoc task are also reported. The performance results demonstrate that the model is competitive with current state-of-the-art retrieval approaches.

Bibtex

@inproceedings{DBLP:conf/trec/Ng99,
    author = {Kenney Ng},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {A Maximum Likelihood Ratio Information Retrieval Model},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/MITSLS\_v2.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Ng99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-MIT}
}

Moving More Quickly toward Full Term Relations in Information Space¶

Gregory B. Newby

Participant: Newby
Paper: 10.6028/NIST.SP.500-246.web-Newby
Runs: isa25 | isa50 | isa25t | isa50t

Abstract

This paper describes the ISpace retrieval system's involvement in TREC8. The main goal for this year's work was to speed up document indexing and query processing compared to previous years. This goal was achieved, but retrieval performance was not as good as for TREC7. System details for the AdHoc task, small Web task, and large Web (VLC) task are presented. The AdHoc task emphasized query expansion, while the large Web track emphasized rapid indexing and retrieval. The paper describes an implementation of a multidimensional tree structure for retrieval from information space based on the kd-tree. The larger setting for ISpace, the TeraScale Retrieval project, is summarized. A concluding section describes plans for ISpace.

Bibtex

@inproceedings{DBLP:conf/trec/Newby99,
    author = {Gregory B. Newby},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Moving More Quickly toward Full Term Relations in Information Space},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/newby-trec99-proceedings.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Newby99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.web-Newby}
}

Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track¶

Isao Namba, Nobuyuki Igata

Participant: fujitsu
Paper: 10.6028/NIST.SP.500-246.web-fujitsu
Runs: Flab8atdn | Flab8as | Flab8atd2 | Flab8ax | Flab8at

Abstract

This year a Fujitsu Laboratory team participated in three tracks:that is ad hoc, small web track, and large web track. As basic techiniques, we compared four popular stemmers, and we made simple removing stop pattern techniques for TREC queries. For the ad hoc task, and small web track, we used the same techiniques. We experimented with area weighting, co-occurence boosting, bi-gram utliza-tion, and reranking by bi-gram extraction from pilot search. The effect of blind application with those te-chiniques is rather limited, or even uncertain in the TREC8 experiment. What we can say from TREC8 result is that blind application of co-occurence boosting and area weighting may be effective for the small web track. They requerie query dependent application. In the large web track, our main interest is ef-ficiency, that is how much resources are required to process 100GB of web text and 10000 real web queries in practical time. Using a statistical based language type checker, we can eliminate 23% of non-English text. This leads to speeding up a indexing and reducing the index size. The search speed for an inverted file is CPU intensive if the target machine has main memory in excess of 10-25% of the index size. So with simple, but effective index compression methods, the throughput of query processing is about 0.54-1.1 query/second even by a single 300MHz Ultra-sparc processor.

Bibtex

@inproceedings{DBLP:conf/trec/NambaI99,
    author = {Isao Namba and Nobuyuki Igata},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Fujitsu Laboratories {TREC8} Report - Ad hoc, Small Web, and Large Web Track},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/flab8\_proceedings\_letter.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/NambaI99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.web-fujitsu}
}

IIT at TREC-8: Improving Baseline Precision¶

M. Catherine McCabe, David O. Holmes, Kenneth L. Alford, Abdur Chowdhury, David A. Grossman, Ophir Frieder

Participant: iit
Paper: 10.6028/NIST.SP.500-246.web-iit
Runs: iit99au1 | iit99ma1 | iit99au2

Abstract

In TREC-8, we participated in the automatic and manual tracks for category A as well as the small web track. This year, we focussed on improving our baseline and then introduced some experimental improvements. Our automatic runs used relevance feedback with a high-precision first pass to select terms and then a high-recall final pass. For manual runs, we used predefined concept lists focussing on phrases and proper nouns in the query. In the small web-track, we submitted one content-only run and two link-plus-content runs. We continued to use the relational model with unchanged SQL for retrieval. Our results show some promise for the use of automatic concepts, expansion within concepts and a high-precision first pass for relevance feedback.

Bibtex

@inproceedings{DBLP:conf/trec/McCabeHACGF99,
    author = {M. Catherine McCabe and David O. Holmes and Kenneth L. Alford and Abdur Chowdhury and David A. Grossman and Ophir Frieder},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{IIT} at {TREC-8:} Improving Baseline Precision},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/iit99pr.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/McCabeHACGF99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.web-iit}
}

The JHU/APL HAIRCUT System at TREC-8¶

James Mayfield, Paul McNamee, Christine D. Piatko

Participant: jhu
Paper: 10.6028/NIST.SP.500-246.adhoc-jhu
Runs: apl8c221 | apl8n | apl8c621 | apl8p | apl8ctd

Abstract

The Johns Hopkins University Applied Physics Laboratory (JHU/APL) is a second-time entrant in the TREC Category A evaluation. The focus of our information retrieval research this year has been on the relative value of and interaction among multiple term types and multiple similarity metrics. In particular, we are interested in examining words and n-grams as indexing terms, and vector models and hidden Markov models as similarity metrics.

Bibtex

@inproceedings{DBLP:conf/trec/MayfieldMP99,
    author = {James Mayfield and Paul McNamee and Christine D. Piatko},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The {JHU/APL} {HAIRCUT} System at {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/JHUAPL.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MayfieldMP99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-jhu}
}

Oracle at Trec8: A Lexical Approach¶

Kavi Mahesh, Jacquelynn Kud, Paul Dixon

Participant: oracle
Paper: 10.6028/NIST.SP.500-246.adhoc-oracle
Runs: orcl99man

Abstract

Oracle's system for Trec8 was the interMedia Text retrieval engine integrated with the Oraclesi database and SQL query language. interMedia Text supports a novel theme-based document retrieval capability using an extensive lexical knowledge base. Trec8 queries constructed by extracting themes from topic titles and descriptions were manually refined. Queries were simple and intuitive. Oracle's results demonstrate that knowledge-based retrieval is a viable and scalable solution for information retrieval and that statistical training and tuning on the document collection is unnecessary for good performance in Trec.

Bibtex

@inproceedings{DBLP:conf/trec/MaheshKD99,
    author = {Kavi Mahesh and Jacquelynn Kud and Paul Dixon},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Oracle at Trec8: {A} Lexical Approach},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/orcl99man.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MaheshKD99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-oracle}
}

TREC-8 Ad-Hoc, Query and Filtering Track Experiments using PIRCS¶

K. L. Kwok, Laszlo Grunfeld, M. Chan

Participant: cuny
Paper: 10.6028/NIST.SP.500-246.query-cuny
Runs: pir9Attd | pir9Aatd | pir9Atd0 | pir9Aa1 | pir9At0

Abstract

In TREC-8, we participated in automatic ad-hoc retrieval as well as the query and filtering tracks. The theme of our participation is 'retrieval lists combination', and the technique is applied throughout our experiments to various degree. It is pointed out that our PIRCS system may be considered as a combination of probabilistic retrieval model and a language model approach. For ad-hoc, three types of experiments were done with short, medium and long queries as before. General approach is similar to TREC-7, but combination of retrieval lists from different query types were used to boost effectiveness. For query track, we submitted one short-query set, and performed retrieval for twenty one natural language query vairants. For filtering track, experiments for adaptive, batch filtering, and routing were performed. For adaptive, historical selected document list was used to train profile term weights and dynamically vary retrieval status value (rsv) threshold for deciding document selection during the course of filtering. For batch filtering, Financial Times FT92 data was used to define 6 retrieval profiles whose results were combined based on coefficients trained via a genetic algorithm. Logistic regression transforms rsv's to probabilities. Routing was similarly done with additional training data obtained from non-FT collections and two additional profiles were defined and combined

Bibtex

@inproceedings{DBLP:conf/trec/KwokGC99,
    author = {K. L. Kwok and Laszlo Grunfeld and M. Chan},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TREC-8} Ad-Hoc, Query and Filtering Track Experiments using {PIRCS}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/queenst8.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KwokGC99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.query-cuny}
}

Twenty-One at TREC-8: using Language Technology for Information Retrieval¶

Wessel Kraaij, Renée Pohlmann, Djoerd Hiemstra

Participant: twentyone
Paper: 10.6028/NIST.SP.500-246.sdr-twentyone
Runs: tno8d4 | tno8d3 | tno8t2

Abstract

This paper describes the official runs of the Twenty-One group for TREC-8. The Twenty-One group participated in the Ad-hoc, CLIR, Adaptive Filtering and SDR tracks. The main focus of our experiments is the development and evaluation of retrieval methods that are motivated by natural language processing techniques. The following new techniques are introduced in this paper. In the Ad-Hoc and CLIR tasks we experimented with automatic sense disambiguation followed by query expansion or translation. We used a combination of thesaurial and corpus information for the disambiguation process. We continued research on CLIR techniques which exploit the target corpus for an implicit disambiguation, by importing the translation probabilities into the probabilistic term-weighting framework. In filtering we extended the the use of language models for document ranking with a relevance feedback algorithm for query term reweighting.

Bibtex

@inproceedings{DBLP:conf/trec/KraaijPH99,
    author = {Wessel Kraaij and Ren{\'{e}}e Pohlmann and Djoerd Hiemstra},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Twenty-One at {TREC-8:} using Language Technology for Information Retrieval},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/twentyone8final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KraaijPH99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.sdr-twentyone}
}

ACSys TREC-8 Experiments¶

David Hawking, Peter Bailey, Nick Craswell

Participant: ACSys
Paper: 10.6028/NIST.SP.500-246.query-ACSys
Runs: acsys8alo | acsys8alo2 | acsys8asn | acsys8amn | acsys8aln2

Abstract

Experiments relating to TREC-8 Ad Hoc, Web Track (Large and Small) and Query Track tasks are described and results reported. Due to time constraints, only minimal effort was put into Ad Hoc and Query Track participation. In the Web Track, Google-style PageRanks were calculated for all 18.5 million pages in the VLC2 collection and for the 0.25 million pages in the WT2g collection. Various combinations of content score and PageRank produced no benefit for TREC style ad hoc retrieval. A major goal in the Web Track was to make engineering improvements to permit indexing of the 100 gigabyte collection and subsequent query processing using a single PC. A secondary goal was to achieve last year's performance (obtained with eight DEC Alphas) with less recourse to effectiveness-harming optimisations. The main goal was achieved and indexing times are comparable to last year's. However, effectiveness results were worse relative to last year and query processing times were approximately double.

Bibtex

@inproceedings{DBLP:conf/trec/HawkingBC99,
    author = {David Hawking and Peter Bailey and Nick Craswell},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {ACSys {TREC-8} Experiments},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/acsys.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HawkingBC99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.query-ACSys}
}

TREC-8 Experiments at SUNY Buffalo¶

Benjamin Han, Ramya Nagarajan, Rohini K. Srihari, Srikanth Munirathnam

Participant: buffalo
Paper: 10.6028/NIST.SP.500-246.sdr-buffalo
Runs: UB99SW | UB99T

Abstract

For TREC-8, State University of New York at Buffalo(UB) participated in the ad-hoc task and the spoken document retrieval(SDR) track. This is our first year of participation at TREC. We submitted two runs for the Ad-hoc task. The first run was term vector-based using SMART[10]. The second run used the TROVE - Text Retrieval using Object VEctors - system. For the SDR Track, we participated in the IR component of the Quasi-SDR task.

Bibtex

@inproceedings{DBLP:conf/trec/HanNSM99,
    author = {Benjamin Han and Ramya Nagarajan and Rohini K. Srihari and Srikanth Munirathnam},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TREC-8} Experiments at {SUNY} Buffalo},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ub-nbpaper.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HanNSM99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.sdr-buffalo}
}

The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8¶

Michael Fuller, Marcin Kaszkiel, Sam Kimberley, Corinna Ng, Ross Wilkinson, Mingfang Wu, Justin Zobel

Participant: rmit
Paper: 10.6028/NIST.SP.500-246.interactive-rmit
Runs: mds08a3 | mds08a2 | mds08a1 | mds08a4 | mds08a5

Abstract

The focus of our work in TREC 8 has again been on the retrieval of documents using arbitrary passages. This year the system has been refined to include variable sized passages and pivot normalisation. Passage based automatic relevance feedback has also been explored, albeit without the use of negative feedback.

Bibtex

@inproceedings{DBLP:conf/trec/FullerKKNWWZ99,
    author = {Michael Fuller and Marcin Kaszkiel and Sam Kimberley and Corinna Ng and Ross Wilkinson and Mingfang Wu and Justin Zobel},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The {RMIT/CSIRO} Ad Hoc, Q{\&}A, Web, Interactive, and Speech Experiments at {TREC} 8},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/RMIT-CSIRO.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/FullerKKNWWZ99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.interactive-rmit}
}

Ad hoc, Cross-language and Spoken Document Information Retrieval at IBM¶

Martin Franz, J. Scott McCarley, Todd Ward

Participant: ibm-franz
Paper: 10.6028/NIST.SP.500-246.sdr-ibm-franz
Runs: ibms99a | ibms99c | ibms99b

Abstract

The Natural Language Systems group at IBM participated in three tracks at TREC-8: ad hoc, SDR and cross-language. Our SDR and ad hoc participation included experiments involving query expansion and clustering-induced document reranking. Our CLIR participation involved both the French and English queries and included experiments with the merging strategy.

Bibtex

@inproceedings{DBLP:conf/trec/FranzMW99,
    author = {Martin Franz and J. Scott McCarley and Todd Ward},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Ad hoc, Cross-language and Spoken Document Information Retrieval at {IBM}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/t8\_ibm\_hlt.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/FranzMW99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.sdr-ibm-franz}
}

CLARIT TREC-8 Manual Ad-Hoc Experiments¶

David A. Evans, Jeffrey Bennett, Xiang Tong, Alison Huettner, ChengXiang Zhai, Emilia Stoica

Participant: claritech
Paper: 10.6028/NIST.SP.500-246.adhoc-claritech
Runs: CL99XT | CL99SD | CL99SDopt1 | CL99SDopt2 | CL99XTopt

Abstract

CLARITECH's submission in TREC-7 demonstrated the utility of document clustering in retrieval. We continued this work in TREC-8, using a clustered document presentation exclusively. We also added significant new functionality to the manual ad hoc user interface, integrating it with an entity extraction subsystem (upgraded and customized for TREC). Extracted entities represent an alternate set of document features. Our experiments suggest that in many cases users might construct more effective queries by moving beyond surface terms and drawing from this more abstract pool of semantic types. Despite the interface enhancements, our focus this year was on system rather than human subject performance, and we simplified the experiment design accordingly. From the users' perspective, there was only one run; the five separate submissions represent variations in post-processing. We spent minimal time preparing the initial queries. Users had 20 (instead of last year's 30) minutes for relevance judgments, and were allowed to modify the query from the start. This year, as well, we reintroduced 'vector-length optimization' in the post-processing of feedback. Recent CLARITECH systems have augmented the manually generated queries with a fixed, arbitrary number of selected terms from top-ranked documents. This year, we experimented with a principled truncation of the candidate term list, and found this had a positive effect on the performance of both of our TREC-7 and TREC-8 final queries. We feel that further performance improvements are likely to be achieved only by developing several complementary techniques and applying them selectively to fine-tune individual queries. User-directed feature selection and vector-length optimization are two such promising techniques.

Bibtex

@inproceedings{DBLP:conf/trec/EvansBTHZS99,
    author = {David A. Evans and Jeffrey Bennett and Xiang Tong and Alison Huettner and ChengXiang Zhai and Emilia Stoica},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{CLARIT} {TREC-8} Manual Ad-Hoc Experiments},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/CLARIT\_ManualAdHoc.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/EvansBTHZS99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-claritech}
}

Fast Automatic Passage Ranking (MultiText Experiments for TREC-8)¶

Gordon V. Cormack, Charles L. A. Clarke, D. I. E. Kisman, Christopher R. Palmer

Participant: waterloo
Paper: 10.6028/NIST.SP.500-246.qa-waterloo
Runs: uwmt8a0 | uwmt8a1 | uwmt8a2

Abstract

TREC-8 represents the fifth year that the Multilext project has participated in TREC [2, 1, 4, 5]. The MultiText project develops and prototypes scalable technologies for parallel information retrieval systems implemented on networks of workstations. Research issues are addressed in the context of this parallel architecture. Issues of concern to the MultiText Project include data distribution, load balancing, fast update, fault tolerance, document structure, relevance ranking, and user interaction. The MultiText system incorporates a unique technique for arbitrary passage retrieval. Since our initial participation in TREC-4 our TREC work has explored variants of this technique. For TREC-8 we focused our efforts on the Web track. In addition, we submitted runs for the Adhoc task (title and title+description) and a run for the Question Answering task.

Bibtex

@inproceedings{DBLP:conf/trec/CormackCKP99,
    author = {Gordon V. Cormack and Charles L. A. Clarke and D. I. E. Kisman and Christopher R. Palmer},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Fast Automatic Passage Ranking (MultiText Experiments for {TREC-8)}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/waterloo.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CormackCKP99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.qa-waterloo}
}

TREC-8 Automatic Ad-Hoc Experiments at Fondazione Ugo Bordoni¶

Claudio Carpineto, Giovanni Romano

Participant: fub
Paper: 10.6028/NIST.SP.500-246.adhoc-fub
Runs: fub99tf | fub99a | fub99td | fub99tt

Abstract

We present further evidence suggesting the feasibilty of using information theoretic query expansion for improving the retrieval effectiveness of automatic document ranking. Compared to our participation in TREC-7, in which we applied this technique to an ineffective initial ranking, here we show that information theoretic query expansion may be effective even when the quality of the first pass ranking is high. In TREC-8 our system has been ranked among the best systems for both automatic ad hoc and short automatic ad hoc. These results are even more interesting considering that we used single-word indexing and well known weighting schemes. We also investigate the use of term variance to refine the weighting schemes employed by our system to weight documents and queries.

Bibtex

@inproceedings{DBLP:conf/trec/CarpinetoR99,
    author = {Claudio Carpineto and Giovanni Romano},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TREC-8} Automatic Ad-Hoc Experiments at Fondazione Ugo Bordoni},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/fub99.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CarpinetoR99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-fub}
}

SMART in TREC 8¶

Chris Buckley, Janet A. Walz

Participant: cornell
Paper: 10.6028/NIST.SP.500-246.query-cornell
Runs: Sab8A1 | Sab8A2 | Sab8A3 | Sab8A4

Abstract

This year was a light year for the Smart Information Retrieval Project at SabIR Research and Cornell. We officially participated in only the Ad-hoc Task and the Query Track. In the Ad-hoc Task, we made minor modifications to our document weighting schemes to emphasize high-precision searches on shorter queries. This proved only mildly successful; the top relevant document was retrieved higher, but the rest of the retrieval tended to be hurt. Our Query Track runs are described here, but the much more interesting analysis of these runs is described in the Query Track Overview.

Bibtex

@inproceedings{DBLP:conf/trec/BuckleyW99a,
    author = {Chris Buckley and Janet A. Walz},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{SMART} in {TREC} 8},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/sabir8.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BuckleyW99a.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.query-cornell}
}

Mercure at trec8: Adhoc, Web, CLIR and Filtering tasks¶

Mohand Boughanem, Christine Julien, Josiane Mothe, Chantal Soulé-Dupuy

Participant: irit
Paper: 10.6028/NIST.SP.500-246.xlingual-irit
Runs: Mer8Adtd1 | Mer8Adtd2 | Mer8Adtnd3 | Mer8Adtd4

Abstract

The tests performed for TREC8 were focused on automatic Adhoc, Web, Clir and Filtering (batch and routing) tasks. All the submitted runs were based on the Mercure system. Automatic adhoc : Four runs were submitted. All these runs were based on automatic relevance back-propagation used in the previous TREC, with a slight change for one of these runs (Mer8Adtd3). A strategy based on predicting the relevance of documents using the past relevant documents was tested for this run. More precisely, instead of using the same relevance value for all top retrieved documents, some of them are selected and have their relevance value boosted. Web : Four runs were submitted in this track: 1. content based only using Mercure simple search 2. content tilink, according to Mercure architecture, we consider that document nodes are linked each other by weighted links. The top selected documents resulting from the initial search spread their signals towards the other document nodes. The documents were then sorted according to their activations, the top 1000 documents were submitted. 3. (2) + pseudo-relevance back-propagation method. 4. reranking of the 40 top documents using their links between each others. Cross-language : Three runs were submitted for our first participation in this track. All these runs were based on query translation using an online machine translation . Two of these runs are a comparison between query translation from English to other languages and from French to other languages. Filtering - batch and routing: The profiles were learned using three different strategies : Relevance Back-propagation (RB) and Gradient Back-propagation (GB) used in the previous TREC and a new strategy based on Genetic Algorithm (GA). Four runs were submitted, two batch runs based on RB+GB and two routing runs, one based on RB+GB and the other one based on GA.

Bibtex

@inproceedings{DBLP:conf/trec/BoughanemJMS99,
    author = {Mohand Boughanem and Christine Julien and Josiane Mothe and Chantal Soul{\'{e}}{-}Dupuy},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Mercure at trec8: Adhoc, Web, {CLIR} and Filtering tasks},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/irit.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BoughanemJMS99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.xlingual-irit}
}

The Weaver System for Document Retrieval¶

Adam L. Berger, John D. Lafferty

Participant: cmu
Paper: 10.6028/NIST.SP.500-246.adhoc-cmu
Runs: weaver1 | weaver2

Abstract

This paper introduces Weaver, a probabilistic document retrieval system under development at Carnegie Mellon University, and discusses its performance in the TREC-8 ad hoc evaluation. We begin by describing the architecture and philosophy of the Weaver system, which represents a departure from traditional approaches to retrieval. The central ingredient is a statistical model of how a user might distill or 'translate' a given document into a query. The retrieval-as-translation approach is based on the noisy channel paradigm and statistical language modeling, and has much in common with other recently proposed models (12, 10]. After the initial high-level overview, the bulk of the paper contains a discussion of implementation details and the empirical performance of the Weaver retrieval system.

Bibtex

@inproceedings{DBLP:conf/trec/BergerL99,
    author = {Adam L. Berger and John D. Lafferty},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The Weaver System for Document Retrieval},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/weaver.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BergerL99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-cmu}
}

INQUERY and TREC-8¶

James Allan, James P. Callan, Fangfang Feng, Daniella Malin

Participant: umass
Paper: 10.6028/NIST.SP.500-246.sdr-umass
Runs: INQ601 | INQ602 | INQ603 | INQ604

Abstract

This year the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts participated in seven of the tracks: ad-hoc, filtering, spoken document retrieval, small web, large web, question and answer, and the query tracks. We spent significant time working on the filtering track, resulting in substantial performance improvement over TREC-7. For all of the other tracks, we used essentially the same system as used in previous years. In the next section, we describe some of the basic processing that was applied across most of the tracks. We then describe the details for each of the tracks and in some cases present some modest analysis of the effectiveness of our results.

Bibtex

@inproceedings{DBLP:conf/trec/AllanCFM99,
    author = {James Allan and James P. Callan and Fangfang Feng and Daniella Malin},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{INQUERY} and {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/trec8-umass.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AllanCFM99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.sdr-umass}
}

University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER)¶

Khurshid Ahmad, Lee Gillam, Lena Tostevin

Participant: city-pliers
Paper: 10.6028/NIST.SP.500-246.adhoc-city-pliers
Runs: plt8ah1 | plt8ah2 | plt8ah3 | plt8ah4 | plt8ah5

Abstract

This paper describes the development of a prototype document retrieval system based on frequency calculations and corpora comparison techniques. The prototype, WILDER, generated simple frequency information based on which calculations of document relevance could be made. The prototype was built to allow the University of Surrey to debut in the U.S. Text Retrieval Competition (TREC). User queries as specified by the TREC organisers were converted into simple word-frequency lists and compared against values for the entire corpus. These relative frequency values indicatively produced document relevance. The application of morphological and empirical heuristics enabled WILDER to produce the ranked frequency lists required.

Bibtex

@inproceedings{DBLP:conf/trec/AhmadGT99,
    author = {Khurshid Ahmad and Lee Gillam and Lena Tostevin},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {University of Surrey Participation in {TREC8:} Weirdness Indexing for Logical Document Extrapolation and Retrieval {(WILDER)}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/surrey2.pdf},
    timestamp = {Tue, 30 Jun 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/AhmadGT99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-city-pliers}
}

PLIERS at TREC8¶

Andrew MacFarlane, Stephen E. Robertson, Julie A. McCann

Participant: city-pliers
Paper: 10.6028/NIST.SP.500-246.filtering-city-pliers
Runs: plt8ah1 | plt8ah2 | plt8ah3 | plt8ah4 | plt8ah5

Abstract

The use of the PLIERS text retrieval system in TREC8 experiments is described. The tracks entered for are: Ad-Hoc, Filtering (Batch and Routing) and the Web Track (Large only). We describe both retrieval efficiency and effectiveness results constant variation. for all these tracks. We also describe some preliminary experiments with BM_25 tuning constant variation.

Bibtex

@inproceedings{DBLP:conf/trec/MacFarlaneRM99,
    author = {Andrew MacFarlane and Stephen E. Robertson and Julie A. McCann},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{PLIERS} at {TREC8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/pliers8.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MacFarlaneRM99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.filtering-city-pliers}
}

High Selectivity and Accuracy with READWARE's Automated System of Knowledge Organization¶

Tom Adi, O. K. Ewell, Patricia Adi

Participant: miti
Paper: 10.6028/NIST.SP.500-246.adhoc-miti
Runs: READWARE2 | READWARE

Abstract

READWARE performs a fully automatic text analysis that implements a system of knowledge organization based on knowledge types. A knowledge type is a set of instructions that identifies a set of knowledge elements in any text. Knowledge types include concepts (word sets), topics (an expandable hierarchical scheme of common knowledge types spanning politics, business, health, and so on), probes (investigative knowledge types), issues (knowledge types used in decisionmaking) and document subjects (traditional classification of documents by themes). An MITi analyst used this system to translate TREC topic specifications into highly selective queries (few hits per query) in two adhoc runs with high relevance rates (2019 / 3060 hits in the READWARE run and 2774 / 5785 hits in the READWARE2 run).

Bibtex

@inproceedings{DBLP:conf/trec/AdiEA99,
    author = {Tom Adi and O. K. Ewell and Patricia Adi},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {High Selectivity and Accuracy with READWARE's Automated System of Knowledge Organization},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/READWARE.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AdiEA99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org},
    doi = {10.6028/NIST.SP.500-246.adhoc-miti}
}

Proceedings - Adhoc 1999¶

IRIS at TREC-8¶

The Mirror DBMS at TREC-8¶

NTT DATA: Overview of system approach at TREC-8 ad-hoc and question answering¶

Natural Language Information Retrieval: TREC-8 Report¶

SCAI TREC-8 Experiments¶

Novel Query Expansion Technique using Apriori Algorithm¶

Automatic Query Feedback using Related Words¶

Retrieval Performance and Visual Dispersion of Query Sets¶

Okapi/Keenbow at TREC-8¶

Structuring and expanding queries in the probabilistic model¶

A Maximum Likelihood Ratio Information Retrieval Model¶

Moving More Quickly toward Full Term Relations in Information Space¶

Fujitsu Laboratories TREC8 Report - Ad hoc, Small Web, and Large Web Track¶

IIT at TREC-8: Improving Baseline Precision¶

The JHU/APL HAIRCUT System at TREC-8¶

Oracle at Trec8: A Lexical Approach¶

TREC-8 Ad-Hoc, Query and Filtering Track Experiments using PIRCS¶

Twenty-One at TREC-8: using Language Technology for Information Retrieval¶

ACSys TREC-8 Experiments¶

TREC-8 Experiments at SUNY Buffalo¶

The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8¶

Ad hoc, Cross-language and Spoken Document Information Retrieval at IBM¶

CLARIT TREC-8 Manual Ad-Hoc Experiments¶

Fast Automatic Passage Ranking (MultiText Experiments for TREC-8)¶

TREC-8 Automatic Ad-Hoc Experiments at Fondazione Ugo Bordoni¶

SMART in TREC 8¶

Mercure at trec8: Adhoc, Web, CLIR and Filtering tasks¶

The Weaver System for Document Retrieval¶

INQUERY and TREC-8¶

University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER)¶

PLIERS at TREC8¶

High Selectivity and Accuracy with READWARE's Automated System of Knowledge Organization¶