Skip to content

Text REtrieval Conference (TREC) 2005

Spam

Overview | Proceedings | Data | Results | Runs | Participants

An automatic spam filter classifies a chronological sequence of email messages as SPAM or HAM (non-spam). The subject filter is run on several email sequences, some public and some private. The performance of the filter is measured with respect to gold standard judgements by a human assessor.

Track coordinator(s):

  • G. Cormack, University of Waterloo
  • T. Lynam, University of Waterloo

Track Web Page: https://plg.uwaterloo.ca/~gvcormac/spam/


Terabyte

Overview | Proceedings | Data | Results | Runs | Participants

The primary goal of the Terabyte Track is to develop an evaluation methodology for terabyte-scale document collections. In addition, we are interested in efficiency and scalability issues, which can be studied more easily in the context of a larger collection. Again this year, we are using a 426GB collection of Web data from the gov domain for all tasks. While this collection is less than a full terabyte in size, it is considerably larger than the collections used in previous TREC tracks. In future years, we hope to expand the collection using data from other sources.

Track coordinator(s):

  • C.L.A. Clarke, University of Waterloo
  • F. Scholer, Royal Melbourne Institute of Technology (RMIT University)
  • I. Soboroff, National Institute of Standards and Technology (NIST)

Track Web Page: https://plg.uwaterloo.ca/~claclark/TB05.html


Genomics

Overview | Proceedings | Data | Results | Runs | Participants

The TREC 2005 Genomics Track featured two tasks, an ad hoc retrieval task and four subtasks in text categorization. The ad hoc retrieval task utilized a 10-year, 4.5-million document subset of the MEDLINE bibliographic database, with 50 topics conforming to five generic topic types. The categorization task used a full-text document collection with training and test sets consisting of about 6,000 biomedical journal articles each. Participants aimed to triage the documents into categories representing data resources in the Mouse Genome Informatics database, with performance assessed via a utility measure.

Track coordinator(s):

  • W. Hersh, Oregon Health & Science University
  • A. Cohen, Oregon Health & Science University
  • J. Yang, Oregon Health & Science University
  • R.T. Bhupatiraju, Oregon Health & Science University
  • P. Roberts, Biogen Idec Corporation
  • M. Hearst, University of California, Berkeley

Track Web Page: https://dmice.ohsu.edu/trec-gen/


HARD

Overview | Proceedings | Data | Results | Runs | Participants

TREC 2005 saw the third year of the High Accuracy Retrieval from Documents (HARD) track. The HARD track explores methods for improving the accuracy of document retrieval systems, with particular attention paid to the start of the ranked list.

Track coordinator(s):

  • J. Allan, University of Massachusetts Amherst

Track Web Page: https://web.archive.org/web/20051201144933/https://ciir.cs.umass.edu/research/hard/guidelines.html/


Question Answering

Overview | Proceedings | Data | Runs | Participants

The goal of the TREC QA track is to foster research on systems that retrieve answers rather than documents in response to a question. The focus is on systems that can function in unrestricted domains.

Track coordinator(s):

  • E.M. Voorhees, National Institute of Standards and Technology (NIST)
  • H.T. Dang, National Institute of Standards and Technology (NIST)

Track Web Page: https://trec.nist.gov/data/qa/t2005_qadata.html


Enterprise

Overview | Proceedings | Data | Results | Runs | Participants

The goal of the enterprise track is to conduct experiments with enterprise data — intranet pages, email archives, document repositories — that reflect the experiences of users in real organisations, such that for example, an email ranking technique that is effective here would be a good choice for deployment in a real multi-user email search application. This involves both understanding user needs in enterprise search and development of appropriate IR techniques.

Track coordinator(s):

  • N. Craswell, Microsoft Research Cambridge
  • A.P. de Vries, CWI
  • I. Soboroff, National Institute of Standards and Technology (NIST)

Track Web Page: https://trec.nist.gov/data/enterprise.html


Robust

Overview | Proceedings | Data | Results | Runs | Participants

The goal of the Robust track is to improve the consistency of retrieval technology by focusing on poorly performing topics. This year, the track will investigate the effectiveness obtainable on a new document set for topics that are known to be difficult on a separate document set. For each topic, participants create a query and submit a ranking of the top 1000 documents for that topic.

Track coordinator(s):

  • E.M. Voorhees, National Institute of Standards and Technology (NIST)

Track Web Page: https://trec.nist.gov/data/robust/05/05.guidelines.html