Runs - Legal 2010¶

BC34Fam¶

Participants

Run ID: BC34Fam
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 69efd2df09b44887fe2bd67a4207653b
Run description: Incorporated familial documents into assessments. Tracked the TA's relatively narrow view of privilege on topic 304.

BC34NoFam¶

Participants

Run ID: BC34NoFam
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 4bcce5bcda5eb4d0ccd83643a25d6264
Run description: Did not incorporate familial documents into assessments. Tracked the TA's relatively narrow view of privilege on topic 304.

BC4BFam¶

Participants

Run ID: BC4BFam
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: cd3b7a55fda68c25593a76fdb651e04a
Run description: Incorporated familial documents into assessments. Adopted a more expansive view of privilege than TA guidance.

BC4BNoFam¶

Participants

Run ID: BC4BNoFam
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: cb1657a562d5fca56e780e385f42d058
Run description: Did not incorporate familial documents into assessments. Adopted a more expansive view of privilege than TA guidance.

BckBigA¶

Participants | Appendix

Run ID: BckBigA
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 8/31/2010
Type: automatic
Task: learning
MD5: 646ce0844b57a1ed7f8b2a0389924089
Run description: Training data flattened across identical hashes. Training data not used to over-write predictions.

BckExtA¶

Participants | Appendix

Run ID: BckExtA
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 8/31/2010
Type: automatic
Task: learning
MD5: ad8ae3ff12980366e3b58d45e99decf0
Run description: Training data flattened across identical hashes. Training data used to over-write predictions.

BckLitA¶

Participants | Appendix

Run ID: BckLitA
Participant: BCK_CGSH
Track: Legal
Year: 2010
Submission: 8/31/2010
Type: automatic
Task: learning
MD5: 2c93fe320d632da4f27bdc2350f35d1b
Run description: Training data not flattened across identical hashes and not used to over-write predictions.

Clearwell10¶

Participants

Run ID: Clearwell10
Participant: Clearwell
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: b1ffe7f3e91881dc43a1e55ac189f6ff
Run description: Clearwell utilized its electronic discovery platform to process, analyze, search and review responsive documents.

DUTHlrgA¶

Participants | Proceedings | Appendix

Run ID: DUTHlrgA
Participant: EE.DUTH.GR
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: automatic
Task: learning
MD5: 548ddd699634158d16fe82352aa109ee
Run description: .

DUTHsdeA¶

Participants | Proceedings | Appendix

Run ID: DUTHsdeA
Participant: EE.DUTH.GR
Track: Legal
Year: 2010
Submission: 8/24/2010
Type: automatic
Task: learning
MD5: 216d0e8ffa67fca69f5f458a37eb73d9
Run description: .

DUTHsdtA¶

Participants | Proceedings | Appendix

Run ID: DUTHsdtA
Participant: EE.DUTH.GR
Track: Legal
Year: 2010
Submission: 8/24/2010
Type: automatic
Task: learning
MD5: ca6426c0010ee7e1f2eb0b231fe1d521
Run description: .

Equivio303R1¶

Participants

Run ID: Equivio303R1
Participant: Equivio
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 522e0d4909a106cbb9dbfbb32ab808cf
Run description: The Equivio run used Equivio>Relevance, an expert-guided system for assessing document relevance. The system feeds statistically selected samples of documents to an expert (an attorney familiar with the case), who marks each sample as relevant or not. The expert's decisions are used to train the software to estimate document relevance. Using a statistical model to determine when the software training process has optimized, the system then calculates graduated relevance scores for each document in the collection.

EwaLanlKvm¶

Participants

Run ID: EwaLanlKvm
Participant: LANL
Track: Legal
Year: 2010
Submission: 9/17/2010
Task: interactive
MD5: 3d1b57dc3f106ae5b9baf30ef3959654
Run description: Collaborative effort between LANL, EWA and Kayvium corporation to test some newly-developed tools. The query models are modular, with at least one element of each of a few building blocks appearing in close proximity being required to trigger a allowed match. At this point the model is overly constrained, but can easily be loosened. Our goal is to start with a high-precision model and then both expand and relax it in a semi-automated way. We also seek to calibrate our scoring system.

Integreon302¶

Participants

Run ID: Integreon302
Participant: Integreon
Track: Legal
Year: 2010
Submission: 9/17/2010
Task: interactive
MD5: 495d7d179a312f19ce115063343e16b9
Run description: integreon 302 topic data analytics

Integreon304¶

Participants

Run ID: Integreon304
Participant: Integreon
Track: Legal
Year: 2010
Submission: 9/17/2010
Task: interactive
MD5: ea50fefae51a100664fa70fd675bc377
Run description: integreon 304 topic data analytics

IRISICAL1¶

Participants | Proceedings

Run ID: IRISICAL1
Participant: IRISICAL
Track: Legal
Year: 2010
Submission: 9/17/2010
Task: interactive
MD5: 5e851d6895230fd79411eeb15c8e9dbc
Run description: Our process uses manual and automatic relevance feedback. The venture started with boolean query formation using anticipated keywords followed by boolean retrieval using Lemur 4.11, ranking of the retrieved documents using Terrier 3.0(DFR-BM25 model) and selecting top 10 documents for TA assessment. Discussions with TAs led to clearer understanding of the notion of relevance/responsiveness and discovery of new keywords. We also applied Rocchio Relevance feedback on the manually judged documents in search of keywords.These keywords were used in boolean retrieval which yielded a set of documents. At this stage we conceived an idea of clustering the retrieved documents on the basis of cosine similarity(normalised). We clustered them by forming connected components out of a graph where each document is a vertex and two vertices will have an edge joining them if the cosine similarity between these two is greater than a chosen threshold. This threshold is taken between 0.3 and 0.5. We assumed that the documents in a cluster are highly likely to be relevant if the cluster contains atleast one of the judged relevant documents. Further interactions with the TAs revealed that our assumption was correct in many of the cases. So starting from a few relevant documents, we succeeded in discovering clusters of relevant documents. The other clusters which didn't contain any judged relevant document also contributed as we picked a few members of them sent for judgements to the TAs and we chose their clusters if these documents were deemed relevant by the TAs. Thus with regular interactions with the TAs and their feedbacks we grew our collection of relevant documents. So starting from a few judged relevant documents we observed significant improvement in performance.

ITCOMRUN0¶

Participants | Proceedings

Run ID: ITCOMRUN0
Participant: IT.com
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: f1d9c5fc268d34b8da93ba9aec3f4e6d
Run description: Review was machine assisted using 1 iteration of active learning.

ITD¶

Participants | Proceedings | Appendix

Run ID: ITD
Participant: IT.com
Track: Legal
Year: 2010
Submission: 8/24/2010
Type: automatic
Task: learning
Run description: .

MailMeter¶

Participants

Run ID: MailMeter
Participant: mailmeter
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 55594b8190bee714b8b462c3131d24a9
Run description: **note- submitted only message doc IDs for responsive messages and parent message doc ID for responsive attachment(s). Title- eDiscovery Using Adaptive Search Criteria and Successive Tagging. Iterative technique using computerized search in conjunction to quick manual review of resulting message titles, facilitated by tagging messages in application UI.

melbit10¶

Participants | Proceedings

Run ID: melbit10
Participant: RMIT
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: fcb1dcf65a00fb2a66a28462f253187c
Run description: All submitted documents were manually reviewed. Documents for review were found by a combination of manual search, using a commercial review tool, and of machine classification. An SVM classifier was used, with seven features. The core feature was the similarity score produced by Indri, using true-relevance feedback over all documents marked by us as relevant. Classification and review was performed iteratively.

otL10bT¶

Participants | Proceedings | Appendix

Run ID: otL10bT
Participant: ot
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 94505d3c894c39b7854c7cd0b5f6560e
Run description: Boolean-based run.

otL10FT¶

Participants | Proceedings | Appendix

Run ID: otL10FT
Participant: ot
Track: Legal
Year: 2010
Submission: 8/23/2010
Type: techassist
Task: learning
MD5: 0cd0ac1078ee8ff0968d55674acbc65a
Run description: Pure relevance feedback run (no use of topic statements).

otL10rvlT¶

Participants | Proceedings | Appendix

Run ID: otL10rvlT
Participant: ot
Track: Legal
Year: 2010
Submission: 8/23/2010
Type: techassist
Task: learning
MD5: 856962be9d12c6a9fd39bfa472a7b553
Run description: Baseline run just based on topic statement (no feedback).

rmitindA¶

Participants | Proceedings | Appendix

Run ID: rmitindA
Participant: RMIT
Track: Legal
Year: 2010
Submission: 8/26/2010
Type: automatic
Task: learning
MD5: 8790bb944e911ce190a77bfc4842689d
Run description: We apply okapi ranking model and use all relevant seed documents for relevance feedback. We estimate probabilities through a simple linear transformation of document's relevance scores.

rmitmlfT¶

Participants | Proceedings | Appendix

Run ID: rmitmlfT
Participant: RMIT
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 3203ac5b2e1dd7a905a3614621ed5514
Run description: This is a support-vector-machine trained running, using all seven features. rmitmlsT used the subset of features which tenfold cross-validation showed to be best for each topic. rmitindA used true-relevance feedback under Indri, which was one of the features in the SVM runs.

rmitmlsT¶

Participants | Proceedings | Appendix

Run ID: rmitmlsT
Participant: RMIT
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 3279e524febc37589a4908914df07523
Run description: This is a support-vector-machine trained running, using the seemingly-best feature subsets for each topic. rmitmlfT used all features for each topic. rmitindA used true-relevance feedback under Indri, which was one of the features in the SVM runs.

tcd1¶

Participants | Proceedings | Appendix

Run ID: tcd1
Participant: TCDI
Track: Legal
Year: 2010
Submission: 8/26/2010
Type: automatic
Task: learning
MD5: 0da155a25d207f5bc817d5b09982dbff
Run description: 1) take RFD topics, NLP/break down into simple terms 2) index TREC data using linguistic technology 3) search Linguistic db using 1 creating 'expert db' usage of meaning 'breakdowns' 4) index TREC data using LSI technology 5) index seeds as 1/-1 for each topic (being obtuse - not counting 0 or -2) 6) determine bayes using 2 as P(H) and 3 as similarity matrix upon seeds as P(D|H) P(D|H') I set out to create a bottom baseline fully auto with no human intervention using multiple technologies. There was no review, only first pass using very limited information (including seeds). Also, can you please run the perl program on the output, as I couldn't get it to exec.

UBlegal2010¶

Participants | Proceedings

Run ID: UBlegal2010
Participant: UB
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 50655ac8ad19fa275f252d1e4473c488
Run description: Indri search engine is used. The topic is approached through various facets involved: enron employees, lobbyists, government officers, government organizations, lobby events. Web resources are used to collect source information to formulate queries.

URSK35T¶

Participants | Proceedings | Appendix

Run ID: URSK35T
Participant: ursinus
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 7dc28aaff5f3e23f56fb2cf44c4eeeae
Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. This set of results was calculated using a k of 35.

URSK70T¶

Participants | Proceedings | Appendix

Run ID: URSK70T
Participant: ursinus
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 4556d37f09bf2b19c0696e3456bd1ed4
Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. This set of results was calculated using a k of 70.

URSLSIT¶

Participants | Proceedings | Appendix

Run ID: URSLSIT
Participant: ursinus
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: techassist
Task: learning
MD5: 362565605ca3947000cfe2ad139aa15c
Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. Each topic's results in this run were from the best-scoring selection of k for that topic.

USFISDS¶

Participants | Proceedings

Run ID: USFISDS
Participant: USF_ISDS
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: 3290d68325697c5c420ad66d4ab3bc8a
Run description: We apply a BOW approach, using multiple level passes and standard deviation threshold cutoffs. We created our tool using MS Visual Studio, and ran the algorithm with .NET 4.0 Framework.

watlint10¶

Participants | Proceedings

Run ID: watlint10
Participant: uwaterlooclarke
Track: Legal
Year: 2010
Submission: 9/16/2010
Task: interactive
MD5: f6d1b52ed5855f7e148473acc937e014
Run description: Interactive search and judging followed by active learning.

xrceCalA¶

Participants | Appendix

Run ID: xrceCalA
Participant: XEROX
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: automatic
Task: learning
MD5: 52a5a68b6b3d2d9dc48bb02597342ce0
Run description: With respect to previous run, the only difference lies in the probability estimates that uses a transform assuming that the training sample selection bias only depends on the labels, and not on the features (this is not true in practice, but we wanted to assess whether this assumption degrades the probability estimates or not).

xrceLogA¶

Participants | Appendix

Run ID: xrceLogA
Participant: XEROX
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: automatic
Task: learning
MD5: 56945251cfd94e0efeeb78627f4466f5
Run description: Logistic Regression with no post-calibration. Uses also a special weighting scheme. Uses all annotated documents of Legal TREC Interactive 2009.

xrceNoRA¶

Participants | Appendix

Run ID: xrceNoRA
Participant: XEROX
Track: Legal
Year: 2010
Submission: 8/25/2010
Type: automatic
Task: learning
MD5: c9c6b9e1f3b480c6658f23ef9965612f
Run description: Use only seeds and no other resources. Logistic Regression with post-calibration. Uses also a special weighting scheme.