Skip to content

Runs - Legal 2010

BC34Fam

Participants

  • Run ID: BC34Fam
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 69efd2df09b44887fe2bd67a4207653b
  • Run description: Incorporated familial documents into assessments. Tracked the TA's relatively narrow view of privilege on topic 304.

BC34NoFam

Participants

  • Run ID: BC34NoFam
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 4bcce5bcda5eb4d0ccd83643a25d6264
  • Run description: Did not incorporate familial documents into assessments. Tracked the TA's relatively narrow view of privilege on topic 304.

BC4BFam

Participants

  • Run ID: BC4BFam
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: cd3b7a55fda68c25593a76fdb651e04a
  • Run description: Incorporated familial documents into assessments. Adopted a more expansive view of privilege than TA guidance.

BC4BNoFam

Participants

  • Run ID: BC4BNoFam
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: cb1657a562d5fca56e780e385f42d058
  • Run description: Did not incorporate familial documents into assessments. Adopted a more expansive view of privilege than TA guidance.

BckBigA

Participants | Appendix

  • Run ID: BckBigA
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 8/31/2010
  • Type: automatic
  • Task: learning
  • MD5: 646ce0844b57a1ed7f8b2a0389924089
  • Run description: Training data flattened across identical hashes. Training data not used to over-write predictions.

BckExtA

Participants | Appendix

  • Run ID: BckExtA
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 8/31/2010
  • Type: automatic
  • Task: learning
  • MD5: ad8ae3ff12980366e3b58d45e99decf0
  • Run description: Training data flattened across identical hashes. Training data used to over-write predictions.

BckLitA

Participants | Appendix

  • Run ID: BckLitA
  • Participant: BCK_CGSH
  • Track: Legal
  • Year: 2010
  • Submission: 8/31/2010
  • Type: automatic
  • Task: learning
  • MD5: 2c93fe320d632da4f27bdc2350f35d1b
  • Run description: Training data not flattened across identical hashes and not used to over-write predictions.

Clearwell10

Participants

  • Run ID: Clearwell10
  • Participant: Clearwell
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: b1ffe7f3e91881dc43a1e55ac189f6ff
  • Run description: Clearwell utilized its electronic discovery platform to process, analyze, search and review responsive documents.

DUTHlrgA

Participants | Proceedings | Appendix

  • Run ID: DUTHlrgA
  • Participant: EE.DUTH.GR
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: automatic
  • Task: learning
  • MD5: 548ddd699634158d16fe82352aa109ee
  • Run description: .

DUTHsdeA

Participants | Proceedings | Appendix

  • Run ID: DUTHsdeA
  • Participant: EE.DUTH.GR
  • Track: Legal
  • Year: 2010
  • Submission: 8/24/2010
  • Type: automatic
  • Task: learning
  • MD5: 216d0e8ffa67fca69f5f458a37eb73d9
  • Run description: .

DUTHsdtA

Participants | Proceedings | Appendix

  • Run ID: DUTHsdtA
  • Participant: EE.DUTH.GR
  • Track: Legal
  • Year: 2010
  • Submission: 8/24/2010
  • Type: automatic
  • Task: learning
  • MD5: ca6426c0010ee7e1f2eb0b231fe1d521
  • Run description: .

Equivio303R1

Participants

  • Run ID: Equivio303R1
  • Participant: Equivio
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 522e0d4909a106cbb9dbfbb32ab808cf
  • Run description: The Equivio run used Equivio>Relevance, an expert-guided system for assessing document relevance. The system feeds statistically selected samples of documents to an expert (an attorney familiar with the case), who marks each sample as relevant or not. The expert's decisions are used to train the software to estimate document relevance. Using a statistical model to determine when the software training process has optimized, the system then calculates graduated relevance scores for each document in the collection.

EwaLanlKvm

Participants

  • Run ID: EwaLanlKvm
  • Participant: LANL
  • Track: Legal
  • Year: 2010
  • Submission: 9/17/2010
  • Task: interactive
  • MD5: 3d1b57dc3f106ae5b9baf30ef3959654
  • Run description: Collaborative effort between LANL, EWA and Kayvium corporation to test some newly-developed tools. The query models are modular, with at least one element of each of a few building blocks appearing in close proximity being required to trigger a allowed match. At this point the model is overly constrained, but can easily be loosened. Our goal is to start with a high-precision model and then both expand and relax it in a semi-automated way. We also seek to calibrate our scoring system.

Integreon302

Participants

  • Run ID: Integreon302
  • Participant: Integreon
  • Track: Legal
  • Year: 2010
  • Submission: 9/17/2010
  • Task: interactive
  • MD5: 495d7d179a312f19ce115063343e16b9
  • Run description: integreon 302 topic data analytics

Integreon304

Participants

  • Run ID: Integreon304
  • Participant: Integreon
  • Track: Legal
  • Year: 2010
  • Submission: 9/17/2010
  • Task: interactive
  • MD5: ea50fefae51a100664fa70fd675bc377
  • Run description: integreon 304 topic data analytics

IRISICAL1

Participants | Proceedings

  • Run ID: IRISICAL1
  • Participant: IRISICAL
  • Track: Legal
  • Year: 2010
  • Submission: 9/17/2010
  • Task: interactive
  • MD5: 5e851d6895230fd79411eeb15c8e9dbc
  • Run description: Our process uses manual and automatic relevance feedback. The venture started with boolean query formation using anticipated keywords followed by boolean retrieval using Lemur 4.11, ranking of the retrieved documents using Terrier 3.0(DFR-BM25 model) and selecting top 10 documents for TA assessment. Discussions with TAs led to clearer understanding of the notion of relevance/responsiveness and discovery of new keywords. We also applied Rocchio Relevance feedback on the manually judged documents in search of keywords.These keywords were used in boolean retrieval which yielded a set of documents. At this stage we conceived an idea of clustering the retrieved documents on the basis of cosine similarity(normalised). We clustered them by forming connected components out of a graph where each document is a vertex and two vertices will have an edge joining them if the cosine similarity between these two is greater than a chosen threshold. This threshold is taken between 0.3 and 0.5. We assumed that the documents in a cluster are highly likely to be relevant if the cluster contains atleast one of the judged relevant documents. Further interactions with the TAs revealed that our assumption was correct in many of the cases. So starting from a few relevant documents, we succeeded in discovering clusters of relevant documents. The other clusters which didn't contain any judged relevant document also contributed as we picked a few members of them sent for judgements to the TAs and we chose their clusters if these documents were deemed relevant by the TAs. Thus with regular interactions with the TAs and their feedbacks we grew our collection of relevant documents. So starting from a few judged relevant documents we observed significant improvement in performance.

ITCOMRUN0

Participants | Proceedings

  • Run ID: ITCOMRUN0
  • Participant: IT.com
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: f1d9c5fc268d34b8da93ba9aec3f4e6d
  • Run description: Review was machine assisted using 1 iteration of active learning.

ITD

Participants | Proceedings | Appendix

  • Run ID: ITD
  • Participant: IT.com
  • Track: Legal
  • Year: 2010
  • Submission: 8/24/2010
  • Type: automatic
  • Task: learning
  • Run description: .

MailMeter

Participants

  • Run ID: MailMeter
  • Participant: mailmeter
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 55594b8190bee714b8b462c3131d24a9
  • Run description: **note- submitted only message doc IDs for responsive messages and parent message doc ID for responsive attachment(s). Title- eDiscovery Using Adaptive Search Criteria and Successive Tagging. Iterative technique using computerized search in conjunction to quick manual review of resulting message titles, facilitated by tagging messages in application UI.

melbit10

Participants | Proceedings

  • Run ID: melbit10
  • Participant: RMIT
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: fcb1dcf65a00fb2a66a28462f253187c
  • Run description: All submitted documents were manually reviewed. Documents for review were found by a combination of manual search, using a commercial review tool, and of machine classification. An SVM classifier was used, with seven features. The core feature was the similarity score produced by Indri, using true-relevance feedback over all documents marked by us as relevant. Classification and review was performed iteratively.

otL10bT

Participants | Proceedings | Appendix

  • Run ID: otL10bT
  • Participant: ot
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 94505d3c894c39b7854c7cd0b5f6560e
  • Run description: Boolean-based run.

otL10FT

Participants | Proceedings | Appendix

  • Run ID: otL10FT
  • Participant: ot
  • Track: Legal
  • Year: 2010
  • Submission: 8/23/2010
  • Type: techassist
  • Task: learning
  • MD5: 0cd0ac1078ee8ff0968d55674acbc65a
  • Run description: Pure relevance feedback run (no use of topic statements).

otL10rvlT

Participants | Proceedings | Appendix

  • Run ID: otL10rvlT
  • Participant: ot
  • Track: Legal
  • Year: 2010
  • Submission: 8/23/2010
  • Type: techassist
  • Task: learning
  • MD5: 856962be9d12c6a9fd39bfa472a7b553
  • Run description: Baseline run just based on topic statement (no feedback).

rmitindA

Participants | Proceedings | Appendix

  • Run ID: rmitindA
  • Participant: RMIT
  • Track: Legal
  • Year: 2010
  • Submission: 8/26/2010
  • Type: automatic
  • Task: learning
  • MD5: 8790bb944e911ce190a77bfc4842689d
  • Run description: We apply okapi ranking model and use all relevant seed documents for relevance feedback. We estimate probabilities through a simple linear transformation of document's relevance scores.

rmitmlfT

Participants | Proceedings | Appendix

  • Run ID: rmitmlfT
  • Participant: RMIT
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 3203ac5b2e1dd7a905a3614621ed5514
  • Run description: This is a support-vector-machine trained running, using all seven features. rmitmlsT used the subset of features which tenfold cross-validation showed to be best for each topic. rmitindA used true-relevance feedback under Indri, which was one of the features in the SVM runs.

rmitmlsT

Participants | Proceedings | Appendix

  • Run ID: rmitmlsT
  • Participant: RMIT
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 3279e524febc37589a4908914df07523
  • Run description: This is a support-vector-machine trained running, using the seemingly-best feature subsets for each topic. rmitmlfT used all features for each topic. rmitindA used true-relevance feedback under Indri, which was one of the features in the SVM runs.

tcd1

Participants | Proceedings | Appendix

  • Run ID: tcd1
  • Participant: TCDI
  • Track: Legal
  • Year: 2010
  • Submission: 8/26/2010
  • Type: automatic
  • Task: learning
  • MD5: 0da155a25d207f5bc817d5b09982dbff
  • Run description: 1) take RFD topics, NLP/break down into simple terms 2) index TREC data using linguistic technology 3) search Linguistic db using 1 creating 'expert db' usage of meaning 'breakdowns' 4) index TREC data using LSI technology 5) index seeds as 1/-1 for each topic (being obtuse - not counting 0 or -2) 6) determine bayes using 2 as P(H) and 3 as similarity matrix upon seeds as P(D|H) P(D|H') I set out to create a bottom baseline fully auto with no human intervention using multiple technologies. There was no review, only first pass using very limited information (including seeds). Also, can you please run the perl program on the output, as I couldn't get it to exec.

UBlegal2010

Participants | Proceedings

  • Run ID: UBlegal2010
  • Participant: UB
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 50655ac8ad19fa275f252d1e4473c488
  • Run description: Indri search engine is used. The topic is approached through various facets involved: enron employees, lobbyists, government officers, government organizations, lobby events. Web resources are used to collect source information to formulate queries.

URSK35T

Participants | Proceedings | Appendix

  • Run ID: URSK35T
  • Participant: ursinus
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 7dc28aaff5f3e23f56fb2cf44c4eeeae
  • Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. This set of results was calculated using a k of 35.

URSK70T

Participants | Proceedings | Appendix

  • Run ID: URSK70T
  • Participant: ursinus
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 4556d37f09bf2b19c0696e3456bd1ed4
  • Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. This set of results was calculated using a k of 70.

URSLSIT

Participants | Proceedings | Appendix

  • Run ID: URSLSIT
  • Participant: ursinus
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: techassist
  • Task: learning
  • MD5: 362565605ca3947000cfe2ad139aa15c
  • Run description: We used EDLSI (essential dimensions of latent semantic indexing) as outlined in "Essential Dimensions of Latent Semantic Indexing" (Kontostathis, 2007). It uses the term-document matrix and a linear combination of the probabilities given by vector-space analysis and standard LSI, with a focus on extracting the useful information from LSI while retaining the computational simplicity and raw power of vector-space retrieval. We queried the database (using text queries approximating the TREC production topics) using EDLSI with multiple selections of k (a parameter in the LSI part.) Then, each topic/k-selection combination was scored based on the seed documents for each topic. Each topic's results in this run were from the best-scoring selection of k for that topic.

USFISDS

Participants | Proceedings

  • Run ID: USFISDS
  • Participant: USF_ISDS
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: 3290d68325697c5c420ad66d4ab3bc8a
  • Run description: We apply a BOW approach, using multiple level passes and standard deviation threshold cutoffs. We created our tool using MS Visual Studio, and ran the algorithm with .NET 4.0 Framework.

watlint10

Participants | Proceedings

  • Run ID: watlint10
  • Participant: uwaterlooclarke
  • Track: Legal
  • Year: 2010
  • Submission: 9/16/2010
  • Task: interactive
  • MD5: f6d1b52ed5855f7e148473acc937e014
  • Run description: Interactive search and judging followed by active learning.

xrceCalA

Participants | Appendix

  • Run ID: xrceCalA
  • Participant: XEROX
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: automatic
  • Task: learning
  • MD5: 52a5a68b6b3d2d9dc48bb02597342ce0
  • Run description: With respect to previous run, the only difference lies in the probability estimates that uses a transform assuming that the training sample selection bias only depends on the labels, and not on the features (this is not true in practice, but we wanted to assess whether this assumption degrades the probability estimates or not).

xrceLogA

Participants | Appendix

  • Run ID: xrceLogA
  • Participant: XEROX
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: automatic
  • Task: learning
  • MD5: 56945251cfd94e0efeeb78627f4466f5
  • Run description: Logistic Regression with no post-calibration. Uses also a special weighting scheme. Uses all annotated documents of Legal TREC Interactive 2009.

xrceNoRA

Participants | Appendix

  • Run ID: xrceNoRA
  • Participant: XEROX
  • Track: Legal
  • Year: 2010
  • Submission: 8/25/2010
  • Type: automatic
  • Task: learning
  • MD5: c9c6b9e1f3b480c6658f23ef9965612f
  • Run description: Use only seeds and no other resources. Logistic Regression with post-calibration. Uses also a special weighting scheme.