Text REtrieval Conference (TREC) 2007¶

Million Query¶

The goal of this track is to run a retrieval task similar to standard ad-hoc retrieval, but to evaluate large numbers of queries incompletely, rather than a small number more completely. Participants will run 10,000 queries and a random 1,000 or so will be evaluated. The corpus is the terabyte track's GOV2 corpus of roughly 25,000,000 .gov web pages, amounting to just under half a terabyte of data.

Track coordinator(s):

J. Allan, University of Massachusetts Amherst
B. Carterette, University of Massachusetts Amherst
B. Dachev, University of Massachusetts Amherst
J. A. Aslam, Northeastern University
V. Pavlu, Northeastern University
E. Kanoulas, Northeastern University

Track Web Page: https://web.archive.org/web/20090311232726/http://ciir.cs.umass.edu/research/million/

Genomics¶

The TREC 2007 Genomics Track employed an entity-based question-answering task. Runs were required to nominate passages of text from a collection of full-text biomedical journal articles to answer the topic questions. Systems were assessed not only for the relevance of passages retrieved, but also how many aspects (entities) of the topic were covered and how many relevant documents were retrieved. We also classified the features of runs to explore which ones were associated with better performance, although the diversity of approaches and the quality of their reporting prevented definitive conclusions from being drawn.

Track coordinator(s):

W. Hersh, Oregon Health & Science University
A. Cohen, Oregon Health & Science University
L. Ruslen, Oregon Health & Science University
P. Roberts, Pfizer Corporation

Track Web Page: https://dmice.ohsu.edu/trec-gen/

Spam¶

Overview | Proceedings | Data | Runs | Participants

TREC's Spam Track uses a standard testing framework that presents a set of chronologically ordered email messages a spam filter for classification. In the filtering task, the messages are presented one at at time to the filter, which yields a binary judgment (spam or ham [i.e. non-spam]) which is compared to a humanadjudicated gold standard. The filter also yields a spamminess score, intended to reflect the likelihood that the classified message is spam, which is the subject of post-hoc ROC (Receiver Operating Characteristic) analysis.

Track coordinator(s):

G. Cormack, University of Waterloo

Track Web Page: https://plg.uwaterloo.ca/~gvcormac/spam/

Legal¶

TREC 2007 was the second year of the Legal Track, which focuses on evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings. The track included three tasks: Ad Hoc (i.e., single-pass automatic) search, Relevance Feedback (two-pass search in a controlled setting with some relevant and nonrelevant documents manually marked after the first pass) and Interactive (in which real users could iteratively refine their queries and/or engage in multi-pass relevance feedback).

Track coordinator(s):

S. Tomlinson, Open Text Corporation
D. W. Oard, University of Maryland, College Park
J. R. Baron, National Archives and Records Administration
P. Thompson, Dartmouth College

Track Web Page: http://trec-legal.umiacs.umd.edu/

Question Answering¶

Overview | Proceedings | Data | Runs | Participants

The TREC 2007 question answering (QA) track contained two tasks: the main task consisting of series of factoid, list, and “Other” questions organized around a set of targets, and the complex, interactive question answering (ciQA) task. The main task differed from previous years in that the document collection comprised blogs in addition to newswire documents, requiring systems to process diverse genres of unstructured text. The evaluation of factoid and list responses distinguished between answers that were globally correct (with respect to the document collection), and those that were only locally correct (with respect to the supporting document but not to the overall document collection). The ciQA task provided a framework for participants to investigate interaction in the context of complex information needs. Standing in for surrogate users, assessors interacted with QA systems live over the Web; this setup allowed participants to experiment with more complex interfaces but also revealed limitations in the ciQA design for evaluation of interactive systems.

Track coordinator(s):

H.T. Dang, National Institute of Standards and Technology (NIST)
D. Kelly, University of North Carolina
J. Lin, University of Maryland, College Park

Track Web Page: https://trec.nist.gov/data/qa/2007_qadata/qa.07.guidelines.html

Enterprise¶

The goal of the enterprise track is to conduct experiments with enterprise data that reflect the experiences of users in real organizations. This year, the track has introduced a new corpus with the goal to be more representative of real-world enterprise search, by involving actual members of the organization in the topic development process, performing their real work tasks.

Track coordinator(s):

P. Bailey, Microsoft, USA
A. P. de Vries, CWI, The Netherlands
N. Craswell, MSR Cambridge, UK
I. Soboroff, National Institute of Standards and Technology (NIST)

Track Web Page: https://trec.nist.gov/data/enterprise.html

Blog¶

The goal of the Blog track is to explore the information seeking behaviour in the blogosphere. It aims to create the required infrastructure to facilitate research into the blogosphere and to study retrieval from blogs and other related applied tasks. The track was introduced in 2006 with a main opinion finding task and an open task, which allowed participants the opportunity to influence the determination of a suitable second task for 2007 on other aspects of blogs besides their opinionated nature.

Track coordinator(s):

C. Macdonald, University of Glasgow
I. Ounis, University of Glasgow
I. Soboroff, National Institute of Standards and Technology (NIST)

Track Web Page: https://www.dcs.gla.ac.uk/wiki/TREC-BLOG