Skip to content

Proceedings - Interactive 1999

TREC-8 Interactive Track Report

William R. Hersh, Paul Over

Abstract

This report is an introduction to the work of the TREC-8 Interactive Track with its goal of investigating interactive information retrieval by examining the process as well as the results. Seven research groups ran a total of 14 interactive information retrieval (IR) system variants on a shared problem: a question-answering task, six statements of information need, and a collection of 210,158 articles from the Financial Times of London 1991-1994. This report summarizes the shared experimental framework, which for TREC-8 was designed to support analysis and comparison of system performance only within sites. The report refers the reader to separate discussions of the experiments performed by each participating group - their hypotheses, experimental systems, and results. The papers from each of the participating groups and the raw and evaluated results are available via the TREC home page (trec.nist.gov).

Bibtex
@inproceedings{DBLP:conf/trec/HershO99,
    author = {William R. Hersh and Paul Over},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{TREC-8} Interactive Track Report},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/t8irep.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HershO99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

IRIS at TREC-8

Kiduk Yang, Kelly Maglaughlin

Abstract

We tested two relevance feedback models, an adaptive linear model and a probabilistic model, using massive feedback query expansion in TREC-5 (Sumner & Shaw, 1997), experimented with a three-valued scale of relevance and reduced feedback query expansion in TREC-6 (Sumner, Yang, Akers & Shaw, 1998), and examined the effectiveness of relevance feedback using a subcollection and the effect of system features in an interactive retrieval system called IRIS (Information Retrieval Interactive System1 ) in TREC-7 (Yang, Maglaughlin, Mehol & Sumner, 1999). In TREC-8, we continued our exploration of relevance feedback approaches. Based on the result of our TREC-7 interactive experiment, which suggested relevance feedback using user-selected passages to be an effective alternative to conventional document feedback, our TREC-8 interactive experiment compared a passage feedback system and a document feedback system that were identical in all aspects except for the feedback mechanism. For the TREC-8 ad-hoc task, we merged results of pseudo-relevance feedback to subcollections as in TREC-7. Our results were consistent with that of TREC-7. The results of passage feedback, whose system log showed high level of searcher intervention, was superior to the document feedback results. As in TREC-7, our ad-hoc results showed high precision in top few documents, but performed poorly overall compared to results using the collection as a whole.

Bibtex
@inproceedings{DBLP:conf/trec/YangM99,
    author = {Kiduk Yang and Kelly Maglaughlin},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {{IRIS} at {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/unc\_tr8final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/YangM99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Berkeley's TREC 8 Interactive Track Entry: Cheshire II and Zprise

Ray R. Larson

Abstract

This paper briefly discusses the UC Berkeley entry in the TREC8 Interactive Track. In this year’s study twelve searchers conducted six searches each, half on the Cheshire II system and the other half on the Zprise system, for a total of 72 searches. Questionnaires were administered to each participant to gather information about basic demographic and searching experience, about each search, about each of the systems, and finally, about the user’s perceptions of the systems. In this paper I will briefly describe the systems used in the study and how they differ in design goals and implementation. The results of the interactive track evaluations and the information derived from the questionnaires are then discussed and future improvements to the Cheshire II system are considered.

Bibtex
@inproceedings{DBLP:conf/trec/Larson99,
    author = {Ray R. Larson},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Berkeley's {TREC} 8 Interactive Track Entry: Cheshire {II} and Zprise},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/berkeley-interactive.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Larson99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Do Batch and User Evaluations Give the Same Results? An Analysis from the TREC-8 Interactive Track

William R. Hersh, Andrew Turpin, Susan Price, Dale Kraemer, Benjamin Chan, Lynetta Sacherek, Daniel Olson

Abstract

An unanswered question in information retrieval research is whether improvements in system performance demonstrated by batch evaluations confer the same benefit for real users. We used the TREC-8 Interactive Track to investigate this question. After identifying a weighting scheme that gave maximum improvement over the baseline, we used it with real users searching on an instance recall task. Our results showed no improvement; although there was overall average improvement comparable to the batch results, it was not statistically significant and due to the effect of just one out of the six queries. Further analysis with more queries is necessary to resolve this question.

Bibtex
@inproceedings{DBLP:conf/trec/HershTPKCSO99,
    author = {William R. Hersh and Andrew Turpin and Susan Price and Dale Kraemer and Benjamin Chan and Lynetta Sacherek and Daniel Olson},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Do Batch and User Evaluations Give the Same Results? An Analysis from the {TREC-8} Interactive Track},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/hersh.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HershTPKCSO99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Interactive Okapi at Sheffield - TREC-8

Micheline Hancock-Beaulieu, Helene Fowkes, Nega Alemayehu, Mark Sanderson

Abstract

The focus of the study was to examine searching behaviour in relation to the three experimental variables, i.e. searcher, system and topic characteristics. Twenty-four subjects searched the six test topics on two versions of the Okapi system, one with relevance feedback and one without. A combination of data collection methods was used including observations, verbal protocols, transaction logs, questionnaires and structured post-search interviews. Search analysis indicates that searching behaviour was largely dependent on topic characteristics. Two types of topics and associated search tasks were identified. Overall best match ranking led to high precision searches and those which included relevance feedback were marginally but not significantly better. The study raises methodological questions with regard to the specification of interactive searching tasks and topics.

Bibtex
@inproceedings{DBLP:conf/trec/Hancock-BeaulieuFAS99,
    author = {Micheline Hancock{-}Beaulieu and Helene Fowkes and Nega Alemayehu and Mark Sanderson},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Interactive Okapi at Sheffield - {TREC-8}},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/shef8.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Hancock-BeaulieuFAS99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8

Michael Fuller, Marcin Kaszkiel, Sam Kimberley, Corinna Ng, Ross Wilkinson, Mingfang Wu, Justin Zobel

Abstract

In TRECT, we tested using clustering technology to organize retrieved documents for aspectual retrieval, but did not find a significant gain for the clustering interface over a ranked list interface. This year, we investigated a question-driven categorization. Unlike the clustering approach, which was data-driven and attempted to discover and present topic relationships that existed in a set of retrieved documents without taking users into account, the question-driven approach tries to organize retrieved documents in a way that is close to the users' mental representation of the expected answer. In our approach, the retrieved documents are categorized dynamically into a set of categories derived from the user's question. The user determines which of several possible sets of categories should be used to organize retrieved documents. Our participation in TREC-8 was to investigate and compare the effectiveness and usability of this question-driven classification with a ranked list model. In the following sections we present a rationale for the question-driven approach, and then describe an experiment that compares this approach with a more traditional ranked list presentation. We then report and analyze the results of this experiment. Based on these findings and discussions, we conclude with some recommendations for future improvement.

Bibtex
@inproceedings{DBLP:conf/trec/FullerKKNWWZ99,
    author = {Michael Fuller and Marcin Kaszkiel and Sam Kimberley and Corinna Ng and Ross Wilkinson and Mingfang Wu and Justin Zobel},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The {RMIT/CSIRO} Ad Hoc, Q{\&}A, Web, Interactive, and Speech Experiments at {TREC} 8},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/RMIT-CSIRO.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/FullerKKNWWZ99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Relevance Feedback versus Local Context Analysis as Term Suggestion Devices: Rutgers' TREC-8 Interactive Track Experience

Nicholas J. Belkin, Colleen Cool, J. Head, J. Jeng, Diane Kelly, Shin-jeng Lin, L. Lobash, Soyeon Park, Pamela A. Savage-Knepshield, C. Sikora

Abstract

Query formulation and reformulation is recognized as one of the most difficult tasks that users in information retrieval systems are asked to perform. This study investigated the use of two different techniques for supporting query reformulation in interactive information retrieval: relevance feedback and Local Context Analysis, both implemented as term−suggestion devices. The former represents techniques which offer user control and understanding of term suggestion; the latter represents techniques which require relatively little user effort. Using the TREC−8 Interactive Track task and experimental protocol, we found that although there were no significant differences between two systems implementing these techniques in terms of user preference and performance in the task, subjects using the Local Context Analysis system had significantly fewer user−defined query terms than those in the relevance feedback system. We conclude that term suggestion without user guidance/control is the better of the two methods tested, for this task, since it required less effort for the same level of performance. We also found that both number of documents saved and number of instances identified by subjects were significantly correlated with the criterion measures of instance recall and precision, and conclude that this suggests that it is not necessary to rely on external evaluators for measurement of performance of interactive information retrieval in the instance identification task.

Bibtex
@inproceedings{DBLP:conf/trec/BelkinCHJKLLPSS99,
    author = {Nicholas J. Belkin and Colleen Cool and J. Head and J. Jeng and Diane Kelly and Shin{-}jeng Lin and L. Lobash and Soyeon Park and Pamela A. Savage{-}Knepshield and C. Sikora},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Relevance Feedback \emph{versus} Local Context Analysis as Term Suggestion Devices: Rutgers' {TREC-8} Interactive Track Experience},
    booktitle = {Proceedings of The Eighth Text REtrieval Conference, {TREC} 1999, Gaithersburg, Maryland, USA, November 17-19, 1999},
    series = {{NIST} Special Publication},
    volume = {500-246},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {1999},
    url = {http://trec.nist.gov/pubs/trec8/papers/ruint.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BelkinCHJKLLPSS99.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}