Runs - Legal 2006¶

humL06dvo¶

Run ID: humL06dvo
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: Same as humL06tvo except that the terms were from the production request instead of from the final boolean. (Stemming still was not applied.)

humL06t¶

Run ID: humL06t
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: manual
Run description: The final boolean query was used, respecting the boolean operators such as AND, phrase, proximity, NOT, etc. Full wildcard matching was supported. A relevance-ranking algorithm was applied to the matching rows. Some hand-editing was needed to convert the queries to our syntax, but the run is automatic in spirit because it just implements the final boolean query intended by the negotiators.

humL06t0¶

Run ID: humL06t0
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: manual
Run description: Same as humL06t except that the defendant boolean was used instead of the final boolean.

humL06tv¶

Run ID: humL06tv
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: Vectorized use of the final boolean query. Operators such as AND, phrase, proximity were dropped. Punctuation was dropped. Full wildcarding was still respected.

humL06tvc¶

Run ID: humL06tvc
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Same as humL06tv except that a duplicate filtering heuristic was applied.

humL06tve¶

Run ID: humL06tve
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: Blind feedback using top-2 rows of humL06tv.

humL06tvo¶

Run ID: humL06tvo
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: Same as humL06tv except metadata not indexed (also, some other minor indexing differences, such as a stopword list applied, and records without a docid-tag skipped).

humL06tvz¶

Run ID: humL06tvz
Participant: hummingbird.tomlinson
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: One percent subset of first 9000 rows of humL06tv (rows 1, 101, 201, 301, ..., 8901) plus next 1000 rows of humL06tv (rows 9001-10000).

NUSCHUA1¶

Run ID: NUSCHUA1
Participant: nus.kor
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: The IITCDIP corpus was indexed with Lucene open source indexer. No attempts were made at correcting OCR errors. We used version 1.2 of the track topics as provided by Jianqiang Wang in order to avoid the Boolean Query errors in the original version. Other than this change, the system is fully automatic. We use a 2 phase approach to find the best documents for each topic. Details of the full system is provided in the NUSCHUA2 run description. For this run, NUSCHUA1, we only ran phase 1 of our system to obtain a high recall document set but do not run phase 2. Thus for us, this run is a baseline for comparing our phase 2 high precision reranking algorithm.

NUSCHUA2¶

Run ID: NUSCHUA2
Participant: nus.kor
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: The IITCDIP corpus was indexed with Lucene open source indexer. No attempts were made at correcting OCR errors. We used version 1.2 of the track topics as provided by Jianqiang Wang in order to avoid the Boolean Query errors in the original version. Other than this change, the system is fully automatic. We use a 2 phase approach to find the best documents for each topic. In phase 1, we aim to produce a high recall document set by merging terms in the Boolean Query and Production Request text to form a high recall query. First, wildcard terms in the final Boolean Query are expanded. All ORed Boolean Query terms are then grouped together, forming groups of terms. Next, the production request sentences are parsed with Minipar in order to extract verb and noun phrases. In order to filter away junk words introduced by the wildcard expansion, we use WordNet to remove any wildcard term that are not synonymous with at least 1 production request phrase word. We next identify intersections between the filtered boolean query group terms and words in the production request phrases. These intersecting terms form the central query that all documents must match. All other terms are optionally matched. Now that we have a set of retrieved documents, we focus on improving precision of the retrieved results in phase 2. This is achieved by reranking the documents such that relevant documents are given a higher score than irrelevant documents. We use an information theoretic method to cluster retrieved documents based on their content words (bigrams), how often these words overlap and cooccurance information. We then boost the score for documents that are in relevant clusters and downweight the score for documents in irrelevant clusters.

SabLeg06aa1¶

Run ID: SabLeg06aa1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Production Request plus words of Negotiatedboolean and Complaint (heavily downweighted), no OCR expansion, no explicit use of metadata, no boolean operatives

SabLeg06ab1¶

Run ID: SabLeg06ab1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Production Request plus words of Negotiatedboolean. no OCR expansion, no explicit use of metadata, no boolean operatives. blind feedback based on top 30 docs.

sableg06ao1¶

Run ID: sableg06ao1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Production Request plus words of Negotiatedboolean. OCR expansion of request words only (share common prefix), no explicit use of metadata, no boolean operatives.

SabLeg06ao2¶

Run ID: SabLeg06ao2
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/2/2006
Type: automatic
Run description: Production Request plus words of Negotiatedboolean. OCR expansion of request words only (share common prefix,suffix, massive expansion), no explicit use of metadata, no boolean operatives.

SabLeg06ar1¶

Run ID: SabLeg06ar1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic

SabLeg06arb1¶

Run ID: SabLeg06arb1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Production Request plus words of Finalboolean, no OCR expansion, no explicit use of metadata, no boolean operatives

SabLeg06arn1¶

Run ID: SabLeg06arn1
Participant: sabir.buckley
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: Production Request plus words of negotiatedboolean, no OCR expansion, no explicit use of metadata, no boolean operatives

UmdBase¶

Run ID: UmdBase
Participant: umaryland.oard
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: we used all words in the filed.

UmdBool¶

Run ID: UmdBool
Participant: umaryland.oard
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: manual
Run description: we automatically converted the Boolean queries into Indri Boolean queries and then manually fixed some syntax errors.

UmdBoolAuto¶

Run ID: UmdBoolAuto
Participant: umaryland.oard
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: we used all words contained in the Boolean and simply ignored the Boolean syntax

UmdComb¶

Run ID: UmdComb
Participant: umaryland.oard
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: we used all words in the filed plus all words contained in the Boolean and simply ignored the Boolean syntax

UMKCB¶

Run ID: UMKCB
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: Basic Ranked Boolean Query from FinalQuery

UMKCB2¶

Run ID: UMKCB2
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: ocr boolean auto w2and

UMKCBQE10¶

Run ID: UMKCBQE10
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: ocr boolean auto query expansion for individual term to 10

UMKCBQE5¶

Run ID: UMKCBQE5
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/31/2006
Type: automatic
Run description: ocr boolean auto query expansion for individual term to 5

UMKCQE100¶

Run ID: UMKCQE100
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: Automatic Query Expansion from FinalQuery

UMKCQE25¶

Run ID: UMKCQE25
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: Automatic Query Expansion from FinalQuery

UMKCSN¶

Run ID: UMKCSN
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: Basic Ranked Surround Boolean Query Without Order from FinalQuery

UMKCSW¶

Run ID: UMKCSW
Participant: umkc.zhao
Track: Legal
Year: 2006
Submission: 7/29/2006
Type: automatic
Run description: Basic Ranked Surround Boolean Query With Order from FinalQuery

york06la01¶

Run ID: york06la01
Participant: yorku.huang
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: 1. Use Okapi BM25 for weighting and retrieval. 2. No relevance feedback is used.

york06la02¶

Run ID: york06la02
Participant: yorku.huang
Track: Legal
Year: 2006
Submission: 8/1/2006
Type: automatic
Run description: 1. Use Okapi BM25 for weighting and retrieval. 2. Use the beta-approximation term weighting method for new term selection to do automatic relevance feedback.