Runs - Total Recall 2015¶

1NB¶

Participants | Proceedings

Run ID: 1NB
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 1 learner, No stop words, BM25 modified to eliminate the need for the b parameter

1NB_sandbox¶

Participants | Proceedings

Run ID: 1NB_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 1 learner, No stop words, BM25 modified to eliminate the need for the b parameter

1SB¶

Participants | Proceedings

Run ID: 1SB
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 1 learner, Stop words, BM25 modified to eliminate the need for the b parameter

1SB_sandbox¶

Participants | Proceedings

Run ID: 1SB_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 1 learner, Stop words, BM25 modified to eliminate the need for the b parameter

1ST¶

Participants | Proceedings

Run ID: 1ST
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 1 learner, Stop words, tf-idf as in baseline

1ST_sandbox¶

Participants | Proceedings

Run ID: 1ST_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 1 learner, Stop words, tf-idf as in baseline

6NB¶

Participants | Proceedings

Run ID: 6NB
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 6 learners, No stop words, BM25 modified to eliminate the need for the b parameter

6NB_sandbox¶

Participants | Proceedings

Run ID: 6NB_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 6 learners, No stop words, BM25 modified to eliminate the need for the b parameter

6SB¶

Participants | Proceedings

Run ID: 6SB
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 6 learners, Stop words, BM25 modified to eliminate the need for the b parameter

6SB_sandbox¶

Participants | Proceedings

Run ID: 6SB_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 6 learners, Stop words, BM25 modified to eliminate the need for the b parameter

6ST¶

Participants | Proceedings

Run ID: 6ST
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: limited
Run description: modification of the baseline. 6 learners, Stop words, tf-idf as in baseline

6ST_sandbox¶

Participants | Proceedings

Run ID: 6ST_sandbox
Participant: TUW
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: sandbox
Run description: modification of the baseline. 6 learners, Stop words, tf-idf as in baseline

Athome1¶

Participants

Run ID: Athome1
Participant: NINJA
Track: Total Recall
Year: 2015
Submission: 9/1/2015
Type: manual
Task: full
Run description: Theory: Documents that had query hits over file names would be most responsive and provide our predictive coding a rich set to train. Method: Queries over filenames. Then Predictive coding in conjunction with Search terms/Straight predictive coding.

Athome2¶

Participants

Run ID: Athome2
Participant: NINJA
Track: Total Recall
Year: 2015
Submission: 9/1/2015
Type: manual
Task: full
Run description: Theory: Developing search queries by searching hacker websites to develop common usage of terms to provide rich training set for predictive coding. Method: narrowly constructed queries over text. Then Predictive coding in conjunction with Search terms/Straight predictive coding.

Athome3¶

Participants

Run ID: Athome3
Participant: NINJA
Track: Total Recall
Year: 2015
Submission: 9/1/2015
Type: manual
Task: full
Run description: Theory: Use of computer generated concept clusters would reduce predictive coding trouble associated with misleading non-relevant data. Method: Search queries in conjunction with concept clusters. Predictive coding with manually picked confidence levels, and search queries over predictive coding.

attemptone¶

Participants | Proceedings

Run ID: attemptone
Participant: catres
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: manual
Task: limited
Run description: Each topic was run in the following manner: There was a team of three people. Two of the people had little to no experience working with our existing query formulation tools and interface. The third did. All three people worked for one hour each, thus there were a total of three person-hours per topic. The work during that hour was split however the team members personally decided to split it, with activities including (1) researching the topic, (2) searching for relevant information, and (3) reading and marking documents as relevant or not. Some team members spent more time researching, others spent more time marking, but each person had only a single hour, per topic, to do all activities. All three team members worked independently of each other and at different times in different geographic locations. A flag was set on each docid so that each independent searcher did not duplicate the effort of another who had already worked on that topic, but otherwise no information was communicated about the topics or topic-related searches between the three team members. Each team member had to learn the topic themselves, from scratch. Two of the team members (coincidentally, the two without a lot of experience using the search tool) did no outside investigation on any of the topics other than Topic 109. The other team member spent about half of his allotted hour researching the topic, before spending the other half hour searching and marking. Again, in all three cases, each team member spent an hour total in manual engagement, researching, searching, and marking. Although our team had registered for TREC early, the decision to carry through with participation was not made until about three weeks before the deadline. By the time all the systems and data were in place to allow the running of the experiment, there was a little less than a week left. Given the manner in which Team CATRES approached this project, there was no formal hypothesis, as such. Rather, the project was primarily an evaluation of the extent to which a continuous active learning tool can effectively assimilate and adjust the disparate skills and knowledge of multiple independent, time-pressured reviewers tasked solely with the obligation to expeditiously locate potential seeds to commence ranking. In that sense, the working hypothesis was essentially that a continuous active learning tool, when combined with an initial seeding associated with tight deadlines, limited knowledge and experience, and potentially inconsistent perspectives, will produce a reasonable result. The manual seeding effort work itself was intentionally extremely limited and also relatively cursory. As planned, three users each worked for no more than one hour apiece to locate potential seed documents. Within that hour, each had to (individually and separately) carry out all three aspects of the task: (1) familiarize themselves with the topic, (2) issue queries to find relevant information, and (3) read and mark that information for relevance. One of the three users was well-versed on the search tool and its capabilities, query operators, and syntax, but the other two users were essentially brand new to the system. All three users averaged between limited to no knowledge of the topics. After this very limited work was complete, the system was essentially set to continuous learning (iterative) mode with no additional human intervention other than the official judgments obtained via the TREC server. Accordingly, judgments were fed to the TREC server in batches and then the remaining unjudged documents in the collection were continuously re-ranked. Given time constraints, in order to expedite the process, batch sizes were increased over time. Batch sizes started small to enhance continuous active learning (100 docs per iteration) and then were gradually increased (250, 500, 1000, 2000, and 5000) as iterative richness gradually dropped. Final batches were submitted just minutes before the deadline. Once Team CATRES made the final decision to participate in the TREC Total Recall Track, the decision was made to implement a real-word scenario of a rush project, with minimal resources, and test the implications of that scenario on the ultimate effectiveness of a continuous active learning tool.

Baseline¶

Participants | Proceedings

Run ID: Baseline
Participant: Webis
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: The Baseline experiment uses a basic naiive approach in retrieving as many relevant document as possible. This serves as the basis for comparing other experimaents. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Use result from the Judgement API to retrain the classifier

BASELINE¶

Participants | Proceedings

Run ID: BASELINE
Participant: UvA.ILPS
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: This were the runs called 'bl1' and 'bl2'. We rebuild the track's baseline, but adjusted sample batch size based on the number of retrieved relevant documents and stopped after a treshold if a batch contained no relevant documents.

BASELINE2¶

Participants | Proceedings

Run ID: BASELINE2
Participant: UvA.ILPS
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: This were the runs called 'RUN3'. Similar to 'BASELINE', but used RandomForest instead of LogisticRegression.

BASELINE2_SANDBOX¶

Participants | Proceedings

Run ID: BASELINE2_SANDBOX
Participant: UvA.ILPS
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: sandbox
Run description: Same as for Athome

BASELINE_SANDBOX¶

Participants | Proceedings

Run ID: BASELINE_SANDBOX
Participant: UvA.ILPS
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: sandbox
Run description: Same as for Athome

CCRi-athome¶

Participants | Proceedings

Run ID: CCRi-athome
Participant: CCRi
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: Our system makes makes minimal assumptions about the language, format, and type of the text input. As such, we do not import any external language resources and perform minimal data cleaning. We represent words from the input corpus as vectors using a neural network model and represent documents as a TF-IDF weighted sum of their word vectors. This model is designed to produce a compact versions of TF-IDF vectors while incorporating information about synonyms. For each query topic, we attach a neural network classifier to the output of the base model. Each classifier is updated dynamically with respect to the given relevance assessments.

iterative_expansion¶

Participants | Proceedings

Run ID: iterative_expansion
Participant: WHU_IRGroup
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: limited
Run description: Our hypothesis is that various aspects of relevant information are reflected by terms in relevant documents. We extracted terms from relevant documents for query expansion and iteratively repeat the process to uncover as much relevant document as possible.

Keyphrase¶

Participants | Proceedings

Run ID: Keyphrase
Participant: Webis
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: The Keyprase experiment builds on the baseline System by intelligently getting a list of phrases from documents judged bz the API as relevant and uses this as new topic for the Adhoc Search. This is done when there are a considerable small amount of documents judged by the API as relevant. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Get result and extract keyphrase from documents judged as relevant by the API, if the ratio of the relevant to the non-relevant is less than 0.3 Use result from the Judgement API to retrain the classifier

Multimodal¶

Participants | Proceedings

Run ID: Multimodal
Participant: eDiscoveryTeam
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: manual
Task: full
Run description: This experiment used hybrid multimodal search methods with a modified continuous active learning approach. We hypothesize that a multimodal method using all tools available for search, including active machine learning and document ranking, will yield a far superior result than any single search method alone.

SandboxBaseline¶

Participants | Proceedings

Run ID: SandboxBaseline
Participant: Webis
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: sandbox
Run description: The Baseline experiment uses a basic naiive approach in retrieving as many relevant document as possible. This serves as the basis for comparing other experimaents. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Use result from the Judgement API to retrain the classifier

SandboxKeyphrase¶

Participants | Proceedings

Run ID: SandboxKeyphrase
Participant: Webis
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: sandbox
Run description: The Keyprase experiment builds on the baseline System by intelligently getting a list of phrases from documents judged bz the API as relevant and uses this as new topic for the Adhoc Search. This is done when there are a considerable small amount of documents judged by the API as relevant. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Get result and extract keyphrase from documents judged as relevant by the API, if the ratio of the relevant to the non-relevant is less than 0.3 Use result from the Judgement API to retrain the classifier

stop3299¶

Participants | Proceedings

Run ID: stop3299
Participant: WaterlooCormack
Track: Total Recall
Year: 2015
Submission: 8/31/2015
Type: automatic
Task: full
Run description: We used the baseline model implementation without modification, except for stopping criteria. We tested 3 elementary stopping criteria. 70recall -- stop when 2,399 non-relevant documents have been submitted 80recall -- stop when 2,399+N/10 non-relevant documents have been submitted, where N is the number of relevant documents that have been submitted reaasonable -- stop when 2,399+N/5 non-relevant documents have been submitted, where N is the number of relevant documents that have been submitted The number "2,399" was chosen because it is a commonly used "control set" size in electronic discovery. The control set is a set of random documents that serve no purpose other than to track the progress of the review. Our hypothesis is that the effort to review 2,399 documents would be more productively spent -- both in terms of achieving and having confidence in having achieved a good result -- reviewing likely relevant rather than random documents.

UWGVCknee100¶

Participants | Proceedings

Run ID: UWGVCknee100
Participant: WaterlooCormack
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: full
Run description: The purpose of this experiment is to test the effectiveness of "knee detection" in the recall-effort curve as a stopping criterion, with a minimum review effort of 100 documents. We used the "baseline model implementation" available to all participants. Our only modification was to add knee-detection code to "call your shot" for 70recall, 80recall, and reasonable.

UWGVCknee1000¶

Participants | Proceedings

Run ID: UWGVCknee1000
Participant: WaterlooCormack
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: full
Run description: The purpose of this experiment is to test the effectiveness of "knee detection" in the recall-effort curve as a stopping criterion, with a minimum review effort of 1000 documents. We used the "baseline model implementation" available to all participants. Our only modification was to add knee-detection code to "call your shot" for 70recall, 80recall, and reasonable. The only difference between this run and UWGVCkne100 is that we required a minimum review effort of 1000 documents.

UWPAH¶

Participants | Proceedings

Run ID: UWPAH
Participant: WaterlooClarke
Track: Total Recall
Year: 2015
Submission: 8/28/2015
Type: automatic
Task: sandbox
Run description: 1.We improve seed selection by applying clustering. 2.We also applied feature engineering to get more features from documents. 3.Query expansion is utilized during active learning.

UWPAH_¶

Participants | Proceedings

Run ID: UWPAH_
Participant: WaterlooClarke
Track: Total Recall
Year: 2015
Submission: 8/29/2015
Type: automatic
Task: full
Run description: 1. Improve seed selection by using clustering. 2. Extent 1-gram features to n-gram features. 3. Query expansion are applied during each iteration and fusion with classification results.