Skip to content

Proceedings - Legal 2007

Stephen Tomlinson, Douglas W. Oard, Jason R. Baron, Paul Thompson

Abstract

TREC 2007 was the second year of the Legal Track, which focuses on evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings. The track included three tasks: Ad Hoc (i.e., single-pass automatic) search, Relevance Feedback (two-pass search in a controlled setting with some relevant and nonrelevant documents manually marked after the first pass) and Interactive (in which real users could iteratively refine their queries and/or engage in multi-pass relevance feedback). This paper describes the design of the three tasks and analyzes the results.

Bibtex
@inproceedings{DBLP:conf/trec/TomlinsonOBT07,
    author = {Stephen Tomlinson and Douglas W. Oard and Jason R. Baron and Paul Thompson},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Overview of the {TREC} 2007 Legal Track},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/TomlinsonOBT07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Brian Almquist, Viet Ha-Thuc, Aditya Kumar Sehgal, Robert J. Arens, Padmini Srinivasan

Abstract

The University of Iowa Team, under coordinating professor Padmini Srinivasan, participated in the legal discovery and enterprise tracks of TREC-2007.

Bibtex
@inproceedings{DBLP:conf/trec/AlmquistHSAS07,
    author = {Brian Almquist and Viet Ha{-}Thuc and Aditya Kumar Sehgal and Robert J. Arens and Padmini Srinivasan},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Exploring the Legal Discovery and Enterprise Tracks at the University of Iowa},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/uiowa.legal.ent.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AlmquistHSAS07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Avi Arampatzis, Jaap Kamps, Martijn Kooken, Nir Nussbaum

Abstract

In this paper, we document our efforts in participating to the TREC 2007 Legal track. We had multiple aims: First, to experiment with using different query formulations, trying to exploit the verbose topic statements. Second, to analyse how ranked retrieval methods can be fruitfully combined with traditional Boolean queries. Our main findings can be summarized as follows: First, we got mixed results trying to combine the original search request with terms extracted from the verbose topic statement. Second, by combining the Boolean reference run wit our ranked retrieval run allows us to get the high recall of the Boolean retrieval, whilst precision scores show an improvement over both the Boolean and the ranked retrieval runs. Third, we found out that if we treat the Boolean query as free text with varying degrees of interpretation of the original operator, we get competitive results. Moreover, both types of queries seem to capture different relevant documents, and the combination between the request text and the Boolean query leads to substantial gain in precision and recall.

Bibtex
@inproceedings{DBLP:conf/trec/ArampatzisKKN07,
    author = {Avi Arampatzis and Jaap Kamps and Martijn Kooken and Nir Nussbaum},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Access to Legal Documents: Exact Match, Best Match, and Combinations},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/uamsterdam-derijke.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ArampatzisKKN07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Examining Overfitting in Relevance Feedback: Sabir Research at TREC 2007

Chris Buckley

Abstract

Sabir Research participated in TREC-2007 in the Million Query and Legal tracks. This writeup focuses on the Legal track, and in particular on the relevance feedback and interactive tasks within the Legal track. The information retrieval software used was the research version of SMART 16.0. SMART was originally developed in the early 1960's by Gerard Salton and since then has continued to be a leading research information retrieval tool. It continues to use a statistical vector space model, with stemming, stop words, weighting, inner product similarity function, and ranked retrieval.

Bibtex
@inproceedings{DBLP:conf/trec/Buckley07,
    author = {Chris Buckley},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Examining Overfitting in Relevance Feedback: Sabir Research at {TREC} 2007},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/sabir.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Buckley07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack, Thomas R. Lynam, David R. Cheriton

Abstract

For the legal track we used the Wumpus search engine and investigated several methods that have proven successful in other domains, including cover density ranking and Okapi BM25 ranking. In addition to the traditional bag-of-words model we used boolean terms and character 4-grams. Pseudo-relevance feedback was effected using logistic regression on character 4-grams. Some runs specifically excluded documents returned by the boolean query so as to increase the number of such documents in the pool. While our runs were all marked as manual, this was only because the process was not fully automated and several tuning parameters were set after we viewed the data; no data-specific tuning was performed in configuring the system for our runs. Our best performing runs used a combination of all of the above-mentioned techniques.

Bibtex
@inproceedings{DBLP:conf/trec/ButtcherCCLC07,
    author = {Stefan B{\"{u}}ttcher and Charles L. A. Clarke and Gordon V. Cormack and Thomas R. Lynam and David R. Cheriton},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {MultiText Legal Experiments at {TREC} 2007},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/uwaterloo-clarke.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ButtcherCCLC07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Wei-Ming Chen, Paul Thompson

Abstract

This report describes Dartmouth College's approach and results for the 2007 TREC Legal Track. Our original plan was to use the Combination of Expert Opinion (CEO) algorithm [1], to combine the search results from several search engines. However, we did not have enough time to build the index for more than one search engine by the time for submission for official runs. The official results described here are based only on the Lemur / Indri [2] search engine.

Bibtex
@inproceedings{DBLP:conf/trec/ChenT07,
    author = {Wei{-}Ming Chen and Paul Thompson},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Dartmouth College at {TREC} 2007 Legal Track},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/dartmouth.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ChenT07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Heting Chu, Irene Crisci, Estella Cisco-Dalrymple, Tricia Daley, Lori Hoeffner, Trudy Katz, Susan Shebar, Carol Sullivan, Sarah Swammy, Maureen Weicher, Galia Yemini-Halevi

Abstract

The team from Long Island University (LIU) participated for the first time in the TREC 2007 Legal Track - Interactive Task. We received a call for participation in mid-March 2007 while a doctoral seminar titled Information Retrieval was in session. All nine students, evenly divided into three groups, performed this task till early May when the semester ended. Each group worked on one topic, taken from the first three on the priority list. The three topics are: Priority 1: All documents that refer or relate to pigeon deaths during the course of animal studies. (Group LIU1) Priority 2: All documents referencing or regarding lawsuits involving claims related to memory loss. (Group LIU2) Priority 3: All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated. (Group LIU3) Each group worked independently on the chosen topic while members within a group collaborated in one way or another from search strategy formulation to retrieved result evaluation. As requested, each group submitted their ranked, top 100 retrieved results and each participant filled out the TREC 2007 Questionnaire. The three sets of top 100 retrieved results were then evaluated by the doctoral seminar instructor's graduate assistant (GA). Those that had been judged as relevant in this round were submitted as our team's final results.

Bibtex
@inproceedings{DBLP:conf/trec/ChuCCDHKSSSWY07,
    author = {Heting Chu and Irene Crisci and Estella Cisco{-}Dalrymple and Tricia Daley and Lori Hoeffner and Trudy Katz and Susan Shebar and Carol Sullivan and Sarah Swammy and Maureen Weicher and Galia Yemini{-}Halevi},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{TREC} 2007 Legal Track Interactive Task: {A} Report from the {LIU} Team},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/longislandu.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ChuCCDHKSSSWY07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Efthimis N. Efthimiadis, Mary A. Hotchkiss

Abstract

The TREC 2007 Legal Track Interactive Task Challenge involved five hypothetical legal “complaints” based on some facet of tobacco litigation. Each complaint included a request to produce relevant documents. These document production requests were broadly worded to force the opposing party to provide a maximum number of responsive documents during discovery. The resources for document production were two databases containing the tobacco litigation documents released under the terms of the Master Settlement Agreement (MSA) between the Attorneys General of several states and seven U.S. tobacco organizations. These two databases, the Legacy Tobacco Documents Library (LTDL) and Tobacco Documents Online (TDO), contain around 7,000,000 documents. The majority of these documents are not legal publications like cases, statutes, or regulations; the databases include scientific studies, corporate correspondence, periodical articles, news stories, and a mix of litigation documents. Finding relevant documents in large databases is easier said than done. Studies have shown that researchers tend to overestimate the effectiveness of online retrieval. In a 1985 study on retrieval effectiveness, attorneys who were confident they had located at least 75% of the relevant documents actually had a success rate of about 20%. (Blair and Maron, 1985). Their research findings had a major impact in information retrieval evaluation, especially of operational systems. In a sequel article Blair (1996) reflected on the impact of their study. Dabney (1986), Bing (1987) and Schweighofer (1999) provide in-depth reviews of the problems of full text searching for legal information and provide suggestions for solutions to the problems. In the past twenty years, the functionality of full-text document-retrieval systems has improved but more evaluation of information retrieval effectiveness is needed. Attorneys and their support staff must recognize that effective information retrieval in today's complex litigation requires a variety of tools and approaches, including a combination of automated searches, sampling of large databases, and a team-based review of these results.

Bibtex
@inproceedings{DBLP:conf/trec/EfthimiadisH07,
    author = {Efthimis N. Efthimiadis and Mary A. Hotchkiss},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {University of Washington {(UW)} at Legal {TREC} Interactive 2007},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/uwashington.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/EfthimiadisH07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Scott Kulp, April Kontostathis

Abstract

This paper describes our participation in the TREC Legal experiments in 2007. We have applied novel normalization techniques that are designed to slightly favor longer documents instead of assuming that all documents should have equal weight. We have also developed a new method for reformulating query text when background information is provided with an information request. We have also experimented with using enhanced OCR error detection to reduce the size of the term list and remove noise in the data. In this article, we discuss the impact of these effects on the TREC 2007 data sets. We show that the use of simple normalization methods significantly outperforms cosine normalization in the legal domain.

Bibtex
@inproceedings{DBLP:conf/trec/KulpK07,
    author = {Scott Kulp and April Kontostathis},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {On Retrieving Legal Files: Shortening Documents and Weeding Out Garbage},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/ursinus.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KulpK07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Stephen Tomlinson

Abstract

We analyze the results of several experimental runs submitted for the TREC 2007 Legal Track (also sometimes known as the Legal Discovery Track). We submitted 4 boolean query runs (the initial proposal by the defendant, the rejoinder by the plaintiff, the final negotiated query, and a variation of the final query which had proximity distances doubled). We submitted 2 vector query runs (one based on the keywords of the final negotiated query, and another based on the (natural language) request text). We submitted a blind feedback run based on the final negotiated boolean query. Finally, we submitted a fusion run of the final boolean, request text and final vector runs. We found that none of the runs had a higher mean estimated Recall@B than the original final negotiated boolean query.

Bibtex
@inproceedings{DBLP:conf/trec/Tomlinson07,
    author = {Stephen Tomlinson},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Experiments with the Negotiated Boolean Queries of the {TREC} 2007 Legal Discovery Track},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Tomlinson07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Howard R. Turtle, Donald Metzler

Abstract

Four baseline experiments using standard Indri retrieval facilities and simple query formulation techniques and two experiments using more advanced formulations (dependence models and pseudo-relevance feedback) are described. All of the experiments perform substantially better than the median performance of automatic runs but exhibit lower estimated precision and recall at B than the reference Boolean run.

Bibtex
@inproceedings{DBLP:conf/trec/TurtleM07,
    author = {Howard R. Turtle and Donald Metzler},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{CIIR} Experiments for {TREC} Legal 2007},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/umass.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/TurtleM07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

WIM at TREC 2007

Jun Xu, Jing Yao, Jiaqian Zheng, Qi Sun, Junyu Niu

Abstract

This paper introduced the four tracks that WIM-Lab Fudan University had taken part in at TREC 2007. For spam track, a multi-centre model was proposed considering the characteristics of spam mails in contrast of traditional 2-class classification methodology, and the incremental clustering and closeness-based classification methods were applied this year. For enterprise track, our research was mainly focused on ranking functions of experts and selecting correct supporting documents regarding to a given topic. For legal track, the effects of word distribution model in query expansion and various corpus pre-processing methods were mainly evaluated. For genomics track, three score methods were proposed to find the most relevant text snippets to a given topic. This paper gives an overview of the methods employed for each sub tasks, and compares the results of each track.

Bibtex
@inproceedings{DBLP:conf/trec/XuYZSN07,
    author = {Jun Xu and Jing Yao and Jiaqian Zheng and Qi Sun and Junyu Niu},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{WIM} at {TREC} 2007},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/fudan-niu.spam.ent.legal.geo.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/XuYZSN07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Feng C. Zhao, Yugyung Lee, Deep Medhi

Abstract

UMKC participated in the 2007 legal track. Our experiments focused mainly on evaluating the different query formulations in the negotiated query refinement process of legal e-discovery. For our study, we considered three sets of paired runs in vector space model and language model respectively. Our experiments indicated that although the Boolean query negotiating process was successful for the standard Boolean retrieval model, it did not make statistically significant query improvements in our ranked systems. This result provided us an insight into the query negotiation process and a new direction to refine queries.

Bibtex
@inproceedings{DBLP:conf/trec/ZhaoLM07,
    author = {Feng C. Zhao and Yugyung Lee and Deep Medhi},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Evaluation of Query Formulations in the Negotiated Query Refinement Process of Legal e-Discovery: {UMKC} at {TREC} 2007 Legal Track},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/umissouri-kc.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ZhaoLM07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Yangbo Zhu, Le Zhao, Jamie Callan, Jaime G. Carbonell

Abstract

This paper reports the experiments of using Indri for the main and routing (relevance feedback) tasks in the TREC 2007 Legal Track. For the main task, we analyze ranking algorithms using different fields, boolean constraints and structured operators. Evaluation results show that structured queries outperform bag-of-words ones. Boolean constraints improve both precision and recall. For the routing task, we train a linear SVM classifier for each topic. Terms with the largest weights are selected to form new queries. Both keywords and simple structured features (term.field) have been investigated. Named-Entity tags, LingPipe sentence breaker and metadata fields of the original documents are used to generate the field information. Results show that structured features and weighted queries improves retrieval, but only marginally. We also show which structures are more useful. It turns out metadata fields are not as important as we thought.

Bibtex
@inproceedings{DBLP:conf/trec/ZhuZCC07,
    author = {Yangbo Zhu and Le Zhao and Jamie Callan and Jaime G. Carbonell},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Stuctured Queries for Legal Search},
    booktitle = {Proceedings of The Sixteenth Text REtrieval Conference, {TREC} 2007, Gaithersburg, Maryland, USA, November 5-9, 2007},
    series = {{NIST} Special Publication},
    volume = {500-274},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2007},
    url = {http://trec.nist.gov/pubs/trec16/papers/cmu-callan.legal.final.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ZhuZCC07.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}