Runs - Legal 2009¶
ADI2009Topic204¶
Results
| Participants
| Input
| Appendix
- Run ID: ADI2009Topic204
- Participant: ADI2009
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
73ce9033b09a9a402168416717161f11
- Run description: Topic 204 / we used ORA e-discovery tool with FAST search; key term and boolean searches to pull results; document sampling to test and refine. Results file contains 80,114 document IDs.
buffalo¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: buffalo
- Participant: SUNY_Buffalo
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
abbd9de463be40926670c27439bf5cb5
- Run description: We combined results of about 15 very specific queries with results of one generic query
CGSHBCK¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: CGSHBCK
- Participant: Cleary_Backstop
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
3b533d93ce88aaf22cbce0c16be9f5a9
- Run description: Cleary - Backstop team results for topics 201, 204, 206, and 207. Topic 206 designed to identify small pool including key documents with very high threshold of responsiveness and minimal manpower. Topics 201, 204, and 207 same as for CGSHBCK1 and CGSHBCK2.
CGSHBCK1¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: CGSHBCK1
- Participant: Cleary_Backstop
- Track: Legal
- Year: 2009
- Submission: 9/17/2009
- Task: interactive
- MD5:
d3d24bc63300b7cf70eec453778dae18
- Run description: Cleary - Backstop team results for topics 201, 204, 206, and 207. Topic 206 designed to identify small pool including key documents with high threshold of responsiveness (but not as high as in CGSHBCK) and minimal manpower. Topics 201, 204, and 207 same as for CGSHBCK and CGSHBCK2.
CGSHBCK2¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: CGSHBCK2
- Participant: Cleary_Backstop
- Track: Legal
- Year: 2009
- Submission: 9/17/2009
- Task: interactive
- MD5:
cae4b25bd68b654eb781d05ebe788088
- Run description: Cleary - Backstop team results for topics 201, 204, 206, and 207. Topic 206 designed to reduce pool of potentially responsive documents with minimal manpower. Topics 201, 204, and 207 same as for CGSHBCK and CGSHBCK1.
clearwell01¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: clearwell01
- Participant: Clearwell09
- Track: Legal
- Year: 2009
- Submission: 9/30/2009
- Task: interactive
- MD5:
8bfba4ccd7adf120e3392eb6926df8ff
- Run description: Clearwell E-Discovery platform v4.5 is used for topic 205 request production.
Clearwell09i¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: Clearwell09i
- Participant: Clearwell09
- Track: Legal
- Year: 2009
- Submission: 9/18/2009
- Task: interactive
- MD5:
602e60a19b05aec0e05589dd27be8917
- Run description: Clearwell E-Discovery platform V4.5 is used for the execution of the legal track interactive task topic 201 and topic 202.
CompCustIT09¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: CompCustIT09
- Participant: ZLTech
- Track: Legal
- Year: 2009
- Submission: 9/17/2009
- Task: interactive
- MD5:
1fdb6c40d02aa983c5c759be7bb307c5
- Run description: In this submission, the emails were reduplicated to approximately 3 million emails and distributed across 104 custodian mailboxes. The reduplicated emails combined the text extraction for the message body with the native files for the attachments to create IETF RFC-2822 MIME emails. The custodians were further associated with titles and department information. The team identified and prioritized the most likely custodians with relevant information based on the titles and department. Once the custodians had been identified, all their email was pulled and made available for review. A couple of other custodians were identified through a enterprise subject-based search without using title and department to see if the custodian identification method could miss important custodians with substantial volumes of relevant email. A variety of search and analytics techniques were used in conjunction with guidance from the Topic Authority. In this submission, the top 4 prioritized custodians based on title and department are included. The email's use an ID we term the "TREC Parent ID" which is the DocId after removing TREC identified duplicates. This ID is the broadest ID used in this study. The DocID is termed the "TREC ID." Since the DocID is not unique in a fully reduplicated set, a further ID, the "JWID" was created as a unique identifier to assist in performing the analysis. Although the top 4 custodians are presented, more custodians were analyzed and additional information can be submitted. 4 out of 104 possible custodians is 3.8% of the user population while in the overall set, the 104 custodians represents 0.48% of the approximately 22,000 Enron employees.
CompEntrIT09¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: CompEntrIT09
- Participant: ZLTech
- Track: Legal
- Year: 2009
- Submission: 9/17/2009
- Task: interactive
- MD5:
992033c26c216d8d0b2ed3c595ec50a4
- Run description: In this submission, the emails were reduplicated to approximately 3 million emails and distributed across 104 custodian mailboxes. The reduplicated emails combined the text extraction for the message body with the native files for the attachments to create IETF RFC-2822 MIME emails. This created a scenario similar to a eDiscovery scenario before processing. The emails were ingested into the ZL eDiscovery review platform where a variety of search and analytics capabilities were applied to the data including full text search, wildcard search, auto-classification, concept search, faceted search etc. These techniques were used in conjunction with interactive guidance from the Topic Authority.
EmcRun1¶
Results
| Participants
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: EmcRun1
- Participant: EMC_CMA_RD
- Track: Legal
- Year: 2009
- Submission: 8/4/2009
- Type: manual
- Task: batch
- MD5:
f61f6e6d341f78f79180bd432fda346b
- Run description: For different reasons, the time available for this experiment was /much/ shorter than desirable. Therefore the experiment had to be kept as simple as possible. A list of noteworthy points about our run: - relevance calculation was based upon TF/IDF; - a list of ~500 stop-words was filtered out of indexing; - indexing and searching was case-insensitive; - the final boolean queries were expanded manually after consulting the request and the complaint.
Equivio205R1¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: Equivio205R1
- Participant: Equivio
- Track: Legal
- Year: 2009
- Submission: 9/30/2009
- Task: interactive
- MD5:
e651fd38adc60ef5ccda35f01d3b6e1f
- Run description: The Equivio run used Equivio>Relevance, an expert-guided system for assessing document relevance. The system feeds statistically selected samples of documents to an expert (an attorney familiar with the case), who marks each sample as relevant or not. The expert's decisions are used to train the software to estimate document relevance. Using a statistical model to determine when the software training process has optimized, the system then calculates graduated relevance scores for each document in the collection.
Equivio207R1¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: Equivio207R1
- Participant: Equivio
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
2d0b6bc45927e51c38efef1374fdf9e9
- Run description: The Equivio run used Equivio>Relevance, an expert-guided system for assessing document relevance. The system feeds statistically selected samples of documents to an expert (an attorney familiar with the case), who marks each sample as relevant or not. The expert's decisions are used to train the software to estimate document relevance. Using a statistical model to determine when the software training process has optimized, the system then calculates graduated relevance scores for each document in the collection.
H52009¶
Results
| Participants
| Input
| Appendix
- Run ID: H52009
- Participant: H5_2009
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
2b7f48bbd450819c1ab9b8cd3b44ac2e
- Run description: H5 has submitted an assessment of documents our system has identified as responsive to topic 204 of TREC's Legal Track Interactive Task. Our system identified 2994 documents as responsive to topic 204. The H5 system combined human expertise and advanced search and information retrieval technologies to assess the totality of the corpus under investigation.
IntegreonB¶
Results
| Participants
| Input
| Appendix
- Run ID: IntegreonB
- Participant: IDS_TREC
- Track: Legal
- Year: 2009
- Submission: 9/29/2009
- Task: interactive
- MD5:
aaa1393eb4fd2b21536021c1ad662c08
- Run description: We considered the entire message unit responsive if any part of the unit was responsive. We have also listed all items from all responsive message units, all emails and attachments from each "family". We are submitting for Topic 205, only.
LogikIT09t¶
Results
| Participants
| Input
| Appendix
- Run ID: LogikIT09t
- Participant: Logik
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
70afb120e099c816ebc3674477cff4bf
- Run description: Documents were classified with a Naive Bayes classifier which was trained from a set of internally tagged documents.
otL09F¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: otL09F
- Participant: ot
- Track: Legal
- Year: 2009
- Submission: 8/3/2009
- Type: manual
- Task: batch
- MD5:
b56113a2fbd451cf4307896b09245156
- Run description: pure relevance feedback run based on forming a query from a random sample of the known relevant documents of size less than 10000 bytes; no topic fields were used; the K value was set to the greater of the retrospective optimal K value for F1 and the estRelL09.append K value, plus 10 percent; the Kh values were just taken from estRelL09.append
otL09frwF¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: otL09frwF
- Participant: ot
- Track: Legal
- Year: 2009
- Submission: 8/4/2009
- Type: manual
- Task: batch
- MD5:
1f887b974c9b6ea2233b4d102e883432
- Run description: rrf-based fusion of feedback (otL09F weight 3), ranked final Boolean (weight 3), request text vector (otL09rvl weight 2), and vector of final Boolean terms (weight 1); the K value was set to the greater of the retrospective optimal K value for F1 and the estRelL09.append K value, plus 10 percent; the Kh values were just taken from estRelL09.append
otL09rvl¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: otL09rvl
- Participant: ot
- Track: Legal
- Year: 2009
- Submission: 8/3/2009
- Type: manual
- Task: batch
- MD5:
549525f606cf4b68e9d27c7b6fe91f67
- Run description: baseline run (no feedback); vector run based on request text terms; English inflections were matched; common instruction words (e.g. "please", "produce", "documents") were manually removed; K includes rsv of 200 or more, Kh includes rsv of 225 or more
pittsis09¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: pittsis09
- Participant: pitt_sis
- Track: Legal
- Year: 2009
- Submission: 9/16/2009
- Task: interactive
- MD5:
79056aa82bed83758e2edbf4841d5184
- Run description: We designed an experiment to investigate into the information seeking behavior of users when conducting e-discovery task. Our focus is on the collaboration among searchers. We observed an expert with legal background and an information retrieval expert working collaboratively on topic 201. How they collaborate with each other to complete the task and what the characteristics of the collaborative information behavior (CIB) are.
ucedlsi¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: ucedlsi
- Participant: URSINUS
- Track: Legal
- Year: 2009
- Submission: 7/31/2009
- Type: manual
- Task: batch
- MD5:
70b8f7c7c3f624e5cdec60d470a9c0ab
- Run description: This run is distributed EDLSI. The indexed dataset is divided into 81 pieces, and Essential Dimensions of Latent Semantic Indexing (EDLSI), a weighted sum of the result from Latent Sematic Indexing (LSI) and vector space information retreival (IR) is applied to each. The scores are then compiled and sorted. The K values are chosen to represent approximately the number of documents at which the number common documents between the three runs is maximized. K_h is chosen at the approximate point where the document scores drop off.
uclsi¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: uclsi
- Participant: URSINUS
- Track: Legal
- Year: 2009
- Submission: 7/31/2009
- Type: manual
- Task: batch
- MD5:
d4403e59e9490171509bbbe75c044cfd
- Run description: This run is LSI with folding-in. The indexed dataset is divided into 81 pieces, and Latent Sematic Indexing (LSI) is performed on the first piece. The remaining 80 pieces are folded in, and the scores computed and sorted. The K values are chosen to represent approximately the number of documents at which the number common documents between the three runs is maximized. K_h is chosen at the approximate point where the document scores drop off.
ucscra¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: ucscra
- Participant: URSINUS
- Track: Legal
- Year: 2009
- Submission: 7/31/2009
- Type: manual
- Task: batch
- MD5:
410c29f24931dee1e3d349ebd6e445e2
- Run description: This run is SCRA-based distributed LSI. The indexed dataset is divided into 81 pieces, and each piece is further divided into 40 pieces. Latent Sematic Indexing (LSI) is applied to each piece, but using the Sparse Column-Row Approximation (SCRA) instead of the Singular Value Decomposition (SVD) which is traditionally used. The scores are then compiled and sorted. The K values are chosen to represent approximately the number of documents at which the number common documents between the three runs is maximized. K_h is chosen at the approximate point where the document scores drop off.
watlint¶
Results
| Participants
| Proceedings
| Input
| Appendix
- Run ID: watlint
- Participant: Waterloo
- Track: Legal
- Year: 2009
- Submission: 9/15/2009
- Task: interactive
- MD5:
bb4ffeeb0b23d2626a9db5a67372e563
- Run description: Interactive search and judging followed by active machine learning with human reviewer in loop. Recall estimated by fitting censored normal distribution to machine learning scores for responsive documents, factoring in an estimate of 90% inter-assessor agreement on non-relevant documents (derived from past experience). Precision estimated assuming 70% as an upper bound on human agreement (based on past experience), reduced for topic 203 due to poor fit of the learning model, coupled with uncertainty in review. Topic 207 estimates are confounded (upwards) by the fact that about 2/3 of the responsive documents are vacuous ".URL" attachments, which were handled as a special case.
watlogistic¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: watlogistic
- Participant: Waterloo
- Track: Legal
- Year: 2009
- Submission: 7/27/2009
- Type: automatic
- Task: batch
- MD5:
4e92883f7a58ffddaa00304efb189ef8
- Run description: Logistic regression using all training examples as described for watrrf. All training examples were used; no cross-validation. K values were estimated from a separate run (not submitted) using the same method with 2-fold cross-validation.
watrrf¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: watrrf
- Participant: Waterloo
- Track: Legal
- Year: 2009
- Submission: 7/27/2009
- Type: automatic
- Task: batch
- MD5:
1ee6dedb83c95b181bae81d07c7a1355
- Run description: Reciprocal rank fusion of several ranking methods: BM25 relevance feedback, Naive Bayes, online logistic regression, batch logistic regression. 2-fold cross validation (splitting examples into equal test and validation sets) was used to determine K. Topic was not used at all. Training examples and features as per watlogistic.
watstack¶
Results
| Participants
| Proceedings
| Input
| Summary (eval)
| Summary (evalH)
| Appendix
- Run ID: watstack
- Participant: Waterloo
- Track: Legal
- Year: 2009
- Submission: 7/27/2009
- Type: automatic
- Task: batch
- MD5:
64094f4c7dc9b2dd996145adac10e0b7
- Run description: Same as watrrf, but classifiers were stacked using logistic regression and 2-fold cross validation.