Skip to content

Proceedings - Enterprise 2005

Overview of the TREC 2005 Enterprise Track

Nick Craswell, Arjen P. de Vries, Ian Soboroff

Abstract

The goal of the enterprise track is to conduct experiments with enterprise data — intranet pages, email archives, document repositories — that reflect the experiences of users in real organisations, such that for example, an email ranking technique that is effective here would be a good choice for deployment in a real multi-user email search application. This involves both understanding user needs in enterprise search and development of appropriate IR techniques. The enterprise track began this year as the successor to the web track, and this is reflected in the tasks and measures. While the track takes much of its inspiration from the web track, the foci are on search at the enterprise scale, incorporating non-web data and discovering relationships between entities in the organisation. [...]

Bibtex
@inproceedings{DBLP:conf/trec/CraswellVS05,
    author = {Nick Craswell and Arjen P. de Vries and Ian Soboroff},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Overview of the {TREC} 2005 Enterprise Track},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/ENTERPRISE.OVERVIEW.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CraswellVS05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Melbourne University 2005: Enterprise and Terabyte Tasks

Vo Ngoc Anh, William Webber, Alistair Moffat

Abstract

This report describes the work done at The University of Melbourne for the TREC-2005 Enterprise and Terabyte Tracks. In the Enterprise Track, we participated in the Discussion Task. We tried a number of different methods to make use of special features of mailing lists to improve retrieval effectiveness, and found the use of thread context to be promising. In the Terabyte Track, we continued our work with impact-based ranking and sought to reduce indexing as well as query time. A new retrieval system has been developed for this purpose and has shown several improvements over our system from last year.

Bibtex
@inproceedings{DBLP:conf/trec/AnhWM05,
    author = {Vo Ngoc Anh and William Webber and Alistair Moffat},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Melbourne University 2005: Enterprise and Terabyte Tasks},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/umelbourne.ent.tera.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AnhWM05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Language Modeling Approaches for Enterprise Tasks

Leif Azzopardi, Krisztian Balog, Maarten de Rijke

Abstract

We describe our participation in the TREC 2005 Enterprise track. We provide a detailed account of the ideas underlying our language modeling approaches to these tasks, report on our results, and give a summary of our findings so far.

Bibtex
@inproceedings{DBLP:conf/trec/AzzopardiBR05,
    author = {Leif Azzopardi and Krisztian Balog and Maarten de Rijke},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Language Modeling Approaches for Enterprise Tasks},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/uamsterdam.ent.balog.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AzzopardiBR05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Research on Expert Search at Enterprise Track of TREC 2005

Yunbo Cao, Jingjing Liu, Shenghua Bao, Hang Li

Abstract

We (MSRA team) participated in the Expert Search task at the Enterprise Track of TREC 2005. This document reports our experimental results on expert search. In our research, we have mainly investigated the effectiveness of a new approach to expert search in which we employ what is referred to as two-stage language model. It consists of two parts, relevance model and co-occurrence model. The relevance model represents whether or not a document is relevant to the query. The co-occurrence model represents whether or not the query is associated with a person. Both models are based on statistical language modeling. We have also examined the effectiveness of the use of a number of sub-models in the two-stage model; each sub-model is based on extraction of one type of metadata. In the body-body co-occurrence sub-model, for example, we consider the use of window-based co-occurrence. The co-occurrence is about whether the query and a person appear within the same window of text. In an extreme case the entire document is viewed as a window, and the submodel is referred to as document-based co-occurrence submodel. We also consider using clustering technique in reranking of persons. Persons are clustered according to their co-occurrences with topics and other persons. [...]

Bibtex
@inproceedings{DBLP:conf/trec/CaoLBL05,
    author = {Yunbo Cao and Jingjing Liu and Shenghua Bao and Hang Li},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Research on Expert Search at Enterprise Track of {TREC} 2005},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/microsoft-asia.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CaoLBL05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Microsoft Cambridge at TREC 14: Enterprise Track

Nick Craswell, Hugo Zaragoza, Stephen Robertson

Abstract

A major focus of much work of the group (as it has been since the City University Okapi work) is the development and refinement of basic ranking algorithms. The workhorse remains the BM25 algorithm; recently [3, 4] we introduced a field-weighted version of this, allowing differential treatment of different fields in the original documents, such as title, anchor text, body text. We have also recently [2] been working on ways of analysing the possible contributions of static (query-independent) evidence, and of incorporating them into the scoring/ranking algorithm. Finally, we have been working on ways of tuning the resulting ranking functions, since each elaboration tends to introduce one or more new free parameters which have to be set through tuning. We used all these techniques successfully in our contribution to the Web track in TREC 2004 [4]. This year's relatively modest TREC effort is confined to applying essentially the same techniques to rather different data, in the Enterprise Track's known item (KI) and discussion search (DS) experiments. The main interest is whether we can identify some fields and features that lead to an improvement over a flat-text baseline, and as a side effect to verify that our ranking model can deliver the benefit.

Bibtex
@inproceedings{DBLP:conf/trec/CraswellZR05,
    author = {Nick Craswell and Hugo Zaragoza and Stephen Robertson},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Microsoft Cambridge at {TREC} 14: Enterprise Track},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/microsoft-cambridge.enterprise.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CraswellZR05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Ingo Frommholz

Abstract

In this paper we present our runs for the TREC 2005 Enterprise Track discussion search task. Our approaches are based on the view of replies as annotations of the previous message. Quotations in particular play an important role, since they indicate the target of such an annotation. We examine the role of quotations as a context for the unquoted parts as well as the role of quotations as an indicator of which parts of a message were seen as important enough to react to. Results show that retrieval effectiveness w.r.t. the topicality of email messages can be improved by applying this annotation view on email messages.

Bibtex
@inproceedings{DBLP:conf/trec/Frommholz05,
    author = {Ingo Frommholz},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Applying the Annotation View on Messages for Discussion Search},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/uduisburg-essen.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Frommholz05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

THUIR at TREC 2005: Enterprise Track

Yupeng Fu, Wei Yu, Yize Li, Yiqun Liu, Min Zhang, Shaoping Ma

Abstract

IR group of Tsinghua University participated in the expert finding task of TREC2005 enterprise track this year. We developed a novel method which is called document reorganization to solve the problem of locating expert for certain query topics. This method collects and combines related information from different media formats to organize a document which describes an expert candidate. This method proves both effective and efficient for expert finding task. Our submitted run (THUENT0505) obtains the best performance in all participants with evaluation metric MAP. The reorganized documents are also significantly smaller in size than the original corpus.

Bibtex
@inproceedings{DBLP:conf/trec/FuYLLZM05,
    author = {Yupeng Fu and Wei Yu and Yize Li and Yiqun Liu and Min Zhang and Shaoping Ma},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{THUIR} at {TREC} 2005: Enterprise Track},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/tsinghuau-ma.ent.pdf},
    timestamp = {Wed, 16 Sep 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/FuYLLZM05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

CSUSM at TREC 2005: Genomics and Enterprise Track

Rocio Guillén

Abstract

In this paper we report on the approach, experiments and results for the Enterprise Track and the Genomics Track. We participated in the adhoc and one of the categorization tasks for the Genomics track. For the enterprise track we participated in the known-item search. We ran experiments using Indri (1], which combines inference networks with language modeling, for the adhoc and the known-item search tasks. For the categorization task we filtered the documents in different stages using decision trees.

Bibtex
@inproceedings{DBLP:conf/trec/Guillen05,
    author = {Rocio Guill{\'{e}}n},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{CSUSM} at {TREC} 2005: Genomics and Enterprise Track},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/calstateu-sanmarcos.geo.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Guillen05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Pitt at TREC 2005: HARD and Enterprise

Daqing He, Jae-wook Ahn

Abstract

The University of Pittsburgh team participated in two tracks for TREC 2005: the High Accuracy Retrieval from Documents (HARD) track and the Enterprise Retrieval track. The goal of Pitt's HARD study in TREC 2005 was to examine the effectiveness of applying Self Organizing Maps (SOM) as a visual presentation tool and as a clustering tool in the context of HARD tasks, especially its role in clarification form generation. Our experiment results demonstrate that SOM can be used as a clustering tool to generate terms for query expansion based on interactive relevance feedback. It produced significant improvement over the baseline when measured by R-Prec. However, its effectiveness of being a visualization tool for users to make relevance feedback still needs careful examination and further studies. Our goal in this year's enterprise search track was to study the effect of query expansion based on an expansion corpus in retrieving emails from an email corpus. The expansion corpus consisted of the WWW, People and ESW sub-collections of the W3C test collection. The results indicate that query expansion based on the expansion corpus can achieve significant improvement over the no expansion baselines. However, there is no significant difference to the simpler query expansion approach using blind relevance feedback. Interestingly the terms used in these two query expansion approaches are different, with averagely only 6 term overlap among 20 possible terms. Further study is needed for examining the effect of combining these two approaches.

Bibtex
@inproceedings{DBLP:conf/trec/HeA05,
    author = {Daqing He and Jae{-}wook Ahn},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Pitt at {TREC} 2005: {HARD} and Enterprise},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/upittsburgh.hard.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/HeA05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

A Menagerie of Tracks at Maryland: HARD, Enterprise, QA, and Genomics, Oh My!

Jimmy Lin, Eileen G. Abels, Dina Demner-Fushman, Douglas W. Oard, Philip Fei Wu, Yejun Wu

Abstract

This year, the University of Maryland participated in four separate tracks: HARD, enterprise, question answering, and genomics. Our HARD experiments involved a trained intermediary who searched for documents on behalf of the user, created clarification forms manually, and exploited user responses accordingly. The aim was to better understand the nature of single-iteration clarification dialogs and to develop an “ontology of clarifications” that can be leveraged to guide system development. For the enterprise track, we submitted official runs to the Known Item Search and the Discussion Search tasks. Document transformation to normalize dates and version numbers was found to be helpful, but suppression of text quoted from earlier messages and expansion of the indexed terms for a message based on subject line threading proved to not be. For the QA track, we submitted a manual run of “other” questions in an effort to quantify human performance on the task. Our genomics track participation was in collaboration with the National Library of Medicine, and is primarily reported in NLM's overview paper.

Bibtex
@inproceedings{DBLP:conf/trec/LinADOWW05,
    author = {Jimmy Lin and Eileen G. Abels and Dina Demner{-}Fushman and Douglas W. Oard and Philip Fei Wu and Yejun Wu},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {A Menagerie of Tracks at Maryland: HARD, Enterprise, QA, and Genomics, Oh My!},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/umaryland-lin.hard.ent.qa.geo.pdf},
    timestamp = {Sat, 29 Jul 2023 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/LinADOWW05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

University of Glasgow at TREC 2005: Experiments in Terabyte and Enterprise Tracks with Terrier

Craig Macdonald, Ben He, Vassilis Plachouras, Iadh Ounis

Abstract

With our participation in TREC 2005, we continue experiments using Terrier, a modular and scalable Information Retrieval (IR) framework, in 4 tasks from the Terabyte and Enterprise tracks. In the Terabyte track, we investigate new Divergence From Randomness weighting models, and a novel query expansion approach that can take into account various document fields, namely content, title and anchor text. In addition, we test a new selective query expansion mechanism which determines the appropriateness of using query expansion on a per-query basis, using statistical information from a low-cost query performance predictor. In the Enterprise track, we investigate combining document fields evidence with other information occurring in an Enterprise setting. In the email known item task, we also investigate temporal and thread priors suitable for email search. In the expert search task, for each candidate, we generate profiles of expertise evidence from the W3C collection. Moreover, we propose a model for ranking these candidate profiles in response to a query.

Bibtex
@inproceedings{DBLP:conf/trec/MacdonaldHPO05,
    author = {Craig Macdonald and Ben He and Vassilis Plachouras and Iadh Ounis},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {University of Glasgow at {TREC} 2005: Experiments in Terabyte and Enterprise Tracks with Terrier},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/uglasgow.tera.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/MacdonaldHPO05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

WIM at TREC 2005

Junyu Niu, Lin Sun, Luqun Lou, Fang Deng, Chen Lin, Haiqing Zheng, Xuanjing Huang

Abstract

This paper describes the three TREC tasks we participated in this year, which are, Genomics track's categorization task and ad hoc task, and Enterprise track's known item search task. For the categorization task, we adopt a domain-specific terms extraction method and an ontology-based method for feature selection. A SVM classifier and a Rocchio based two staged classifier were also used in this experiment. For the ad-hoc task, we used BM25 algorithm, probabilistic model and query expansion. For the Enterprise track, language model was adopted, and entity recognition was also implemented in our experiment.

Bibtex
@inproceedings{DBLP:conf/trec/NiuSLDLZH05,
    author = {Junyu Niu and Lin Sun and Luqun Lou and Fang Deng and Chen Lin and Haiqing Zheng and Xuanjing Huang},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{WIM} at {TREC} 2005},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/fudanu-sun.geo.ent.pdf},
    timestamp = {Tue, 20 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/NiuSLDLZH05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Experiments with Language Models for Known-Item Finding of E-mail Messages

Paul Ogilvie, Jamie Callan

Abstract

We present experiments using language models to rank e-mail messages for the Known-Item Finding task of the Enterprise track. We combine evidence from the text of the message, its subject, the text of the thread the in which the message occurs, and the text of messages that are in reply to the message. We find that the only statistically significant differences suggest that in addition to the text of the message, the subject is a very important piece of evidence. We also explore the use of a depth based prior, where emphasis is place on messages near the root of the thread structure, which has mixed results.

Bibtex
@inproceedings{DBLP:conf/trec/OgilvieC05,
    author = {Paul Ogilvie and Jamie Callan},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Experiments with Language Models for Known-Item Finding of E-mail Messages},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/carnegie-mu-callan.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/OgilvieC05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The Lowlands' TREC Experiments 2005

Henning Rode, Djoerd Hiemstra, Georgina Ramírez, Thijs Westerveld, Arjen P. de Vries

Abstract

This paper describes our participation to the TREC HARD track (High Accuracy Retrieval of Documents) and the TREC Enterprise track. The main goal of our HARD participation is the development and evaluation of so-called query profiles: Short summaries of the retrieved results that enable the user to perform more focused search, for instance by zooming in on a particular time period. The main goal of our Enterprise track participation is to investigate the potential of the structural information for this type of retrieval task. In particular, we study the use of the thread information and the subject and header fields of the email documents. As a secondary and long standing research goal, we aim at developing an information retrieval framework that supports many diverse retrieval applications by means of one simple yet powerful query language (similar to SQL or relational algebra) that hides the implementation details of retrieval approaches from the application developer, while still giving the application developer control over the ranking process. Both the HARD system and the Enterprise system (as well as our TRECVID video retrieval system [14]) are based on MonetDB, an open source database system developed at CWI [1]. The paper is organised as follows. First, we discusses our participation in the HARD track. We define query profiles and discuss the way we generate them in Section 2. Section 3 describes the clarification forms used and Section 4 explains how we refine the queries and rank the results. We end this part by analysing our experimental results in Section 5 and giving some conclusions for this track in Section 6. The second part of the paper discusses our participation in the enterprise track. We start by describing the system and experimental setup in Section 7. Section 8 discusses the approaches taken for each of the subtasks and Section 9 analyses the results. We end by giving some conclusions and future work for this track in Section 10. The final part of the paper describes our future plans for building a so-called parameterised search engine within the Dutch National project MultimediaN.

Bibtex
@inproceedings{DBLP:conf/trec/RodeHRWV05,
    author = {Henning Rode and Djoerd Hiemstra and Georgina Ram{\'{\i}}rez and Thijs Westerveld and Arjen P. de Vries},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {The Lowlands' {TREC} Experiments 2005},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/lowlands-team.hard.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RodeHRWV05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The QMUL Team with Probabilistic SQL at Enterprise Track

Thomas Roelleke, Elham Ashoori, Hengzhi Wu, Zhen Cai

Abstract

The enterprise track caught our attention, since the task is similar to a project we carried our for the BBC. Our motivation for participation has been twofold: On one hand, there is the usual challenge to design and test the quality of retrieval strategies. On the other hand, and for us very important, the TREC participation has been an opportunity to investigate the resource effort it requires to deliver a TREC result. Our main findings from this TREC participation are: 1. Through the consequent usage of our probabilistic variant of SQL, we could describe retrieval strategies within a few lines of code. 2. The processing time proved sufficient to deal with the collection. 3. The abstraction-oriented data modelling layers of our HySpirit framework enable relatively junior researches to explore a TREC collection and submit runs. 4. For the less complex retrieval tasks (discussion search, known-item search), minimal resources lead to acceptable results, whereas for the more complex retrieval tasks (expert search), inclusion and combination of all available evidence appear to significantly improve retrieval quality.

Bibtex
@inproceedings{DBLP:conf/trec/RoellekeAWC05,
    author = {Thomas Roelleke and Elham Ashoori and Hengzhi Wu and Zhen Cai},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {The {QMUL} Team with Probabilistic {SQL} at Enterprise Track},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/queen-mary-ulondon.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RoellekeAWC05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC 2005 Enterprise Track Experiments at BUPT

Zhao Ru, Yuehua Chen, Weiran Xu, Jun Guo

Abstract

This paper introduces and analyzes some experiments to find valid methods and features in enterprise search. For this purpose, two main experiments have been done. One is to retrieve some emails which contain the required information in all the emails of an enterprise, and the other is to try to find some experts who are helpful in a particular fields. Some features of the intranet dataset, such as the subject, the author, the date and the thread, are proved to be useful when searching an email. A new two-stage rank method which is different from traditional IR is introduced for expert search.

Bibtex
@inproceedings{DBLP:conf/trec/RuCXG05,
    author = {Zhao Ru and Yuehua Chen and Weiran Xu and Jun Guo},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{TREC} 2005 Enterprise Track Experiments at {BUPT}},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/beijingu-of-pt.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RuCXG05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Experiments for HARD and Enterprise Tracks

Olga Vechtomova, Maheedhar Kolla, Murat Karamuftuoglu

Abstract

The main theme in our participation in this year's HARD track was experimentation with the effect of lexical cohesion on document retrieval. Lexical cohesion is a major characteristic of natural language texts, which is achieved through semantic connectedness between words in text, and expresses continuity between the parts of text [7]. Segments of text which are about the same or similar subjects (topics) have higher lexical cohesion, i.e. share a larger number of words than unrelated segments. We have experimented with two approaches to the selection of query expansion terms based on lexical cohesion: (1) by selecting query expansion terms that form lexical links between the distinct original query terms in the document (section 1.1); and (2) by identifying lexical chains in the document and selecting query expansion terms from the strongest lexical chains (section 1.2).

Bibtex
@inproceedings{DBLP:conf/trec/VechtomovaKK05,
    author = {Olga Vechtomova and Maheedhar Kolla and Murat Karamuftuoglu},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Experiments for {HARD} and Enterprise Tracks},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/uwaterloo-vechtomova.hard.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/VechtomovaKK05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC 14 Enterprise Track at CSIRO and ANU

Mingfang Wu, David Hawking, Paul Thomas

Abstract

The primary goals of the CSIRO and ANU team's participation in the enterprise track were two-fold: 1) to investigate how well our search engine PADRE responds to the new collection and the new tasks, and 2) to explore if document structure specific to an email collection can be used to improve system performance. By the time of submission deadline, we completed two tasks: known-item search and discussion search. For both tasks, we used the PADRE retrieval system [1], in which the Okapi BM25 relevance function was implemented. Each message in the collection was treated as an independent document, so both topic distillation scoring and same site suppression mechanism were turned off (i.e. -nocool and -SSS0 respectively). During the indexing, stemming and stopword elimination were not applied and sequences of letters and/or digits were considered as indexable words. We parsed the HTML pages in the original collection into an XML format (the DTD is shown in the appendix), and removed non-email pages. Our parsed collection includes 174,311 email messages, and we used this collection for our experiments.

Bibtex
@inproceedings{DBLP:conf/trec/WuHT05,
    author = {Mingfang Wu and David Hawking and Paul Thomas},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{TREC} 14 Enterprise Track at {CSIRO} and {ANU}},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/csiro.ent.pdf},
    timestamp = {Wed, 07 Apr 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/WuHT05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

CNDS Expert Finding System for TREC 2005

Conglei Yao, Bo Peng, Jing He, Zhifeng Yang

Abstract

This paper describes our system developed for Expert Finding task of Enterprise Track for TREC2005. This system employs 3 methods, traditional IR method, email clustering method and entry page finding method, to find experts related to a specific topic in W3C corpus. Experiment indicates that traditional IR method is useful to expert finding if the query is well generated, email clustering method is helpful when the mail list is relevant to a unique work group or committee, and entry page finding method is valuable while the topic is the theme of a special group. We use result aggregation methods of linear synthesis to combine the results generated by the three methods,. Of our 5 runs submitted for Expert Finding task, the best run is the one generated by linear synthesis, providing a MAP score of 0.2174(Bpref of 0.4299 and p@10 of 0.3460).

Bibtex
@inproceedings{DBLP:conf/trec/Yang05,
    author = {Conglei Yao and Bo Peng and Jing He and Zhifeng Yang},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{CNDS} Expert Finding System for {TREC} 2005},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/pekingu-he.ent.pdf},
    timestamp = {Mon, 15 May 2023 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Yang05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

TREC 2005 Enterprise Track Results from Drexel

Weizhong Zhu, Robert B. Allen, Min Song

Abstract

The primary goal of Discussion Search is to identify a discussion about a topic. A secondary goal is to determine whether a given message expresses pro or con arguments with respect to the discussion. We employed a combination of POS-driven query expansion and a text-classification technique from [6]. The results of those previous experiments indicated that the technique best performed in extracting protein-protein interaction pairs from MEDLINE. The original email corpus was extremely heterogeneous. We first applied the Tidy HTML parser to strip tags and to identify data such as the sender, thread history, and subject of the messages. We then linked messages into threads in two ways. The corpus provides thread index files for email communications. These thread indexes are composed of hieratically structured multiple discussion threads and single thread. For multiple discussion threads, we unified them into a thread document. We also combined single documents when they had the same subject.

Bibtex
@inproceedings{DBLP:conf/trec/ZhuAS05,
    author = {Weizhong Zhu and Robert B. Allen and Min Song},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{TREC} 2005 Enterprise Track Results from Drexel},
    booktitle = {Proceedings of the Fourteenth Text REtrieval Conference, {TREC} 2005, Gaithersburg, Maryland, USA, November 15-18, 2005},
    series = {{NIST} Special Publication},
    volume = {500-266},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2005},
    url = {http://trec.nist.gov/pubs/trec14/papers/drexelu.ent.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ZhuAS05.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}