Run description: Theory: Documents that had query hits over file names would be most responsive and provide our predictive coding a rich set to train. Method: Queries over filenames. Then Predictive coding in conjunction with Search terms/Straight predictive coding.
Run description: Theory: Developing search queries by searching hacker websites to develop common usage of terms to provide rich training set for predictive coding. Method: narrowly constructed queries over text. Then Predictive coding in conjunction with Search terms/Straight predictive coding.
Run description: Theory: Use of computer generated concept clusters would reduce predictive coding trouble associated with misleading non-relevant data. Method: Search queries in conjunction with concept clusters. Predictive coding with manually picked confidence levels, and search queries over predictive coding.
Run description: Each topic was run in the following manner: There was a team of three people. Two of the people had little to no experience working with our existing query formulation tools and interface. The third did. All three people worked for one hour each, thus there were a total of three person-hours per topic. The work during that hour was split however the team members personally decided to split it, with activities including (1) researching the topic, (2) searching for relevant information, and (3) reading and marking documents as relevant or not. Some team members spent more time researching, others spent more time marking, but each person had only a single hour, per topic, to do all activities. All three team members worked independently of each other and at different times in different geographic locations. A flag was set on each docid so that each independent searcher did not duplicate the effort of another who had already worked on that topic, but otherwise no information was communicated about the topics or topic-related searches between the three team members. Each team member had to learn the topic themselves, from scratch. Two of the team members (coincidentally, the two without a lot of experience using the search tool) did no outside investigation on any of the topics other than Topic 109. The other team member spent about half of his allotted hour researching the topic, before spending the other half hour searching and marking. Again, in all three cases, each team member spent an hour total in manual engagement, researching, searching, and marking. Although our team had registered for TREC early, the decision to carry through with participation was not made until about three weeks before the deadline. By the time all the systems and data were in place to allow the running of the experiment, there was a little less than a week left. Given the manner in which Team CATRES approached this project, there was no formal hypothesis, as such. Rather, the project was primarily an evaluation of the extent to which a continuous active learning tool can effectively assimilate and adjust the disparate skills and knowledge of multiple independent, time-pressured reviewers tasked solely with the obligation to expeditiously locate potential seeds to commence ranking. In that sense, the working hypothesis was essentially that a continuous active learning tool, when combined with an initial seeding associated with tight deadlines, limited knowledge and experience, and potentially inconsistent perspectives, will produce a reasonable result. The manual seeding effort work itself was intentionally extremely limited and also relatively cursory. As planned, three users each worked for no more than one hour apiece to locate potential seed documents. Within that hour, each had to (individually and separately) carry out all three aspects of the task: (1) familiarize themselves with the topic, (2) issue queries to find relevant information, and (3) read and mark that information for relevance. One of the three users was well-versed on the search tool and its capabilities, query operators, and syntax, but the other two users were essentially brand new to the system. All three users averaged between limited to no knowledge of the topics. After this very limited work was complete, the system was essentially set to continuous learning (iterative) mode with no additional human intervention other than the official judgments obtained via the TREC server. Accordingly, judgments were fed to the TREC server in batches and then the remaining unjudged documents in the collection were continuously re-ranked. Given time constraints, in order to expedite the process, batch sizes were increased over time. Batch sizes started small to enhance continuous active learning (100 docs per iteration) and then were gradually increased (250, 500, 1000, 2000, and 5000) as iterative richness gradually dropped. Final batches were submitted just minutes before the deadline. Once Team CATRES made the final decision to participate in the TREC Total Recall Track, the decision was made to implement a real-word scenario of a rush project, with minimal resources, and test the implications of that scenario on the ultimate effectiveness of a continuous active learning tool.
Run description: The Baseline experiment uses a basic naiive approach in retrieving as many relevant document as possible. This serves as the basis for comparing other experimaents. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Use result from the Judgement API to retrain the classifier
Run description: This were the runs called 'bl1' and 'bl2'. We rebuild the track's baseline, but adjusted sample batch size based on the number of retrieved relevant documents and stopped after a treshold if a batch contained no relevant documents.
Run description: Our system makes makes minimal assumptions about the language, format, and type of the text input. As such, we do not import any external language resources and perform minimal data cleaning. We represent words from the input corpus as vectors using a neural network model and represent documents as a TF-IDF weighted sum of their word vectors. This model is designed to produce a compact versions of TF-IDF vectors while incorporating information about synonyms. For each query topic, we attach a neural network classifier to the output of the base model. Each classifier is updated dynamically with respect to the given relevance assessments.
Run description: Our hypothesis is that various aspects of relevant information are reflected by terms in relevant documents. We extracted terms from relevant documents for query expansion and iteratively repeat the process to uncover as much relevant document as possible.
Run description: The Keyprase experiment builds on the baseline System by intelligently getting a list of phrases from documents judged bz the API as relevant and uses this as new topic for the Adhoc Search. This is done when there are a considerable small amount of documents judged by the API as relevant. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Get result and extract keyphrase from documents judged as relevant by the API, if the ratio of the relevant to the non-relevant is less than 0.3 Use result from the Judgement API to retrain the classifier
Run description: This experiment used hybrid multimodal search methods with a modified continuous active learning approach. We hypothesize that a multimodal method using all tools available for search, including active machine learning and document ranking, will yield a far superior result than any single search method alone.
Run description: The Baseline experiment uses a basic naiive approach in retrieving as many relevant document as possible. This serves as the basis for comparing other experimaents. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Use result from the Judgement API to retrain the classifier
Run description: The Keyprase experiment builds on the baseline System by intelligently getting a list of phrases from documents judged bz the API as relevant and uses this as new topic for the Adhoc Search. This is done when there are a considerable small amount of documents judged by the API as relevant. Method Perform Adhoc Search using the topic given Train the classifier using the result from the adhoc search Send result for Judgement Get result and extract keyphrase from documents judged as relevant by the API, if the ratio of the relevant to the non-relevant is less than 0.3 Use result from the Judgement API to retrain the classifier
Run description: We used the baseline model implementation without modification, except for stopping criteria. We tested 3 elementary stopping criteria. 70recall -- stop when 2,399 non-relevant documents have been submitted 80recall -- stop when 2,399+N/10 non-relevant documents have been submitted, where N is the number of relevant documents that have been submitted reaasonable -- stop when 2,399+N/5 non-relevant documents have been submitted, where N is the number of relevant documents that have been submitted The number "2,399" was chosen because it is a commonly used "control set" size in electronic discovery. The control set is a set of random documents that serve no purpose other than to track the progress of the review. Our hypothesis is that the effort to review 2,399 documents would be more productively spent -- both in terms of achieving and having confidence in having achieved a good result -- reviewing likely relevant rather than random documents.
Run description: The purpose of this experiment is to test the effectiveness of "knee detection" in the recall-effort curve as a stopping criterion, with a minimum review effort of 100 documents. We used the "baseline model implementation" available to all participants. Our only modification was to add knee-detection code to "call your shot" for 70recall, 80recall, and reasonable.
Run description: The purpose of this experiment is to test the effectiveness of "knee detection" in the recall-effort curve as a stopping criterion, with a minimum review effort of 1000 documents. We used the "baseline model implementation" available to all participants. Our only modification was to add knee-detection code to "call your shot" for 70recall, 80recall, and reasonable. The only difference between this run and UWGVCkne100 is that we required a minimum review effort of 1000 documents.
Run description: 1.We improve seed selection by applying clustering. 2.We also applied feature engineering to get more features from documents. 3.Query expansion is utilized during active learning.
Run description: 1. Improve seed selection by using clustering. 2. Extent 1-gram features to n-gram features. 3. Query expansion are applied during each iteration and fusion with classification results.