Skip to content

Text REtrieval Conference (TREC) 1996

Adhoc

Overview | Proceedings | Data | Results | Runs | Participants

The ad hoc task investigates the performance of systems that search a static set of documents using new topics. This task is similar to how a researcher might use a library — the collection is known but the questions likely to be asked are not known.

Track coordinator(s):

  • E. Voorhees, National Institute of Standards and Technology (NIST)
  • D. Harman, National Institute of Standards and Technology (NIST)

Track Web Page: https://trec.nist.gov/data/test_coll.html


Database Merging

Overview | Proceedings | Results | Runs | Participants

There are many times when users want to search separate text collections as if they were a single collection. For example, computer networks can provide access to a variety of corpora that are owned and maintained by different entities. Instead of issuing search commands to each of the databases in turn and manually collating the individual results, users prefer a mechanism for performing a single, integrated search. In other cases, reliability and efficiency concerns may dictate that databases that are under the same administrative control should be physically separate. Again, users want to issue a single search request that returns an integrated result. The database merging track investigates methods for combining the results of separate searches into a single, cohesive result.

Track coordinator(s):

  • E. Voorhees, National Institute of Standards and Technology (NIST)

Routing

Overview | Proceedings | Data | Results | Runs | Participants

The routing task in the TREC workshops investigates the performance of systems that use standing queries to search new streams of documents. These searches are similar to those required by news clipping services and library profiling systems. A true routing environment is simulated in TREC by using topics that have known relevant documents and testing on a completely new document set.

Track coordinator(s):

  • E. Voorhees, National Institute of Standards and Technology (NIST)
  • D. Harman, National Institute of Standards and Technology (NIST)

Filtering

Overview | Proceedings | Runs | Participants

The TREC-5 filtering track, an evaluation of binary text classification systems, was a repeat of the filtering evaluation run in a trial version for TREC-4, with only the data set and participants changing. Seven sites took part, submitting a total of ten runs. We review the nature of the task, the effectiveness measures and evaluation methods used, and briefly discuss the results. Some deficiencies in the evaluation are examined, with an eye toward improving future filtering evaluations.

Track coordinator(s):

  • D. D. Lewis, AT&T Labs-Research

Spanish

Overview | Proceedings | Data | Results | Runs | Participants

The TREC-5 conference was the third year in which document retrieval in a language other than English was benchmarked. In TREC-3, 4 groups participated in an ad hoc retrieval task on a collection of 208 Mbytes of Mexican newspaper text in the Spanish language. In TREC-4 there were 10 groups who participated, once again in an ad hoc document retrieval task on the same Mexican newspaper texts but with new topics. In TREC-5 there was a change of document corpus and new topics for the Spanish ad hoc retrieval task and a corpus of documents and topics to support ad hoc retrieval in the Chinese language was introduced for the first time. There were 7 groups who submitted runs for the Spanish track and 10 who submitted results for Chinese.

Track coordinator(s):

  • A. Smeaton, Dublin City University
  • R. Wilkinson, Royal Melbourne Institute of Technology (RMIT University)

Chinese

Overview | Proceedings | Data | Results | Runs | Participants

The TREC-5 conference was the third year in which document retrieval in a language other than English was benchmarked. In TREC-3, 4 groups participated in an ad hoc retrieval task on a collection of 208 Mbytes of Mexican newspaper text in the Spanish language. In TREC-4 there were 10 groups who participated, once again in an ad hoc document retrieval task on the same Mexican newspaper texts but with new topics. In TREC-5 there was a change of document corpus and new topics for the Spanish ad hoc retrieval task and a corpus of documents and topics to support ad hoc retrieval in the Chinese language was introduced for the first time. There were 7 groups who submitted runs for the Spanish track and 10 who submitted results for Chinese.

Track coordinator(s):

  • A. Smeaton, Dublin City University
  • R. Wilkinson, Royal Melbourne Institute of Technology (RMIT University)

NLP

Overview | Proceedings | Results | Runs | Participants

The NLP track was initiated to explore whether the natural language processing (NLP) techniques available today are mature enough to have an impact on IR, and specifically whether they can offer an advantage over purely quantitative retrieval methods.

Track coordinator(s):

  • T. Strzalkowski, GE Corporate Research and Development
  • K. Sparck Jones, University of Cambridge

Confusion

Overview | Proceedings | Data | Runs | Participants

For TREC-5, retrieval from corrupted data was studied through retrieval of single target documents from a corpus which was corrupted by producing page images, corrupting the bit maps, and applying OCR techniques to the results. In general, methods which attempted a probabilistic estimation of the original clean text fare better than methods which simply accept corrupted versions of the query text.

Track coordinator(s):

  • P. Kantor, Rutgers University
  • E. Voorhees, National Institute of Standards and Technology (NIST)

Interactive

Overview | Proceedings | Data | Runs | Participants

The high-level goal of the Interactive Track in TREC-5 was the investigation of searching as an interactive task by examining the process as well as the outcome.

Track coordinator(s):

  • P. Over, National Institute of Standards and Technology (NIST)