Runs - Knowledge Base Acceleration 2012¶

CWI-DISAMBIGUATOR¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-DISAMBIGUATOR
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/5/2012
Type: automatic
Task: main
MD5: 2c9087cabc1b867fee83c5983a400276
Run description: This method uses the words in the dbpedia page of the entities to disambiguate the ambiguous entities. Mainly, a documents is considered central if it contains the label of the dbpedia entity and at least one word that occur in the dbpedia page of this entity.

CWI-google_dic_1¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-google_dic_1
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 7b1362681c4f6bc7f8a8a1f8e6bb2110
Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. However, entities whose max score is <0.01 are normalized by 0.01 to dicourage them from being equally competitive with others entities. The systems strength relevant+central

CWI-google_dic_2¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-google_dic_2
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 23f06f2f91bf34c167140c5a476fa75f
Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.

CWI-google_dic_3¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-google_dic_3
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 71df0a95d3e6b889491aa0f5dc4a86d7
Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.

CWI-google_strip_1¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-google_strip_1
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 268c5a7698522b90e40fb3599b2c9c58
Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. Entities who highest Scores <0.01 are normalized by 0.01 to dicourage them from being equally competitive with other entities.

CWI-google_strip_2¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-google_strip_2
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: d960cedea6a30e2ded896b6f7ea0cf19
Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000.

CWI-LANGUAGEMODEL¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-LANGUAGEMODEL
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 5577f8963df9a6bd4957a5fb501f1cbd
Run description: ONLY CENTRAL.It uses a language model for each entity. The language models are built using only the central documents. Then those language models are used to rank the test documents. We compare each document with the perplexity measure. The scores are normalized between 0 and 1000. This method is used to detect only central documents.

CWI-LEARNING16000¶

Participants | Proceedings | Input | Appendix

Run ID: CWI-LEARNING16000
Participant: CWI
Track: Knowledge Base Acceleration
Year: 2012
Submission: 8/31/2012
Type: automatic
Task: main
MD5: 1d639a9420cb65a85e879c8e9b4a09d4
Run description: Find only central documents using a supervised approach. It uses a list of query strings learned from the trained data. Documents retrieve are those that exact match a string in this list. For each entity, a list of strings are used as a query. Those strings are equal the exact matched to the label of the entity on wikipedia plus 10 character before and after its occurrence on the trainning documents.

helsinki-disgraph¶

Participants | Proceedings | Input | Appendix

Run ID: helsinki-disgraph
Participant: helsinki
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: a96ad7377a819fa6f2763b19a95c1b1d
Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.

helsinki-disgraph2¶

Participants | Proceedings | Input | Appendix

Run ID: helsinki-disgraph2
Participant: helsinki
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: c2adb664a5f7c60e8e83d311a9ea4ffa
Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.

hltcoe-wordNER¶

Participants | Proceedings | Input | Appendix

Run ID: hltcoe-wordNER
Participant: hltcoe
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 304afc6ddd8f64cd1d08806f5b7e30e6
Run description: Support vector machine using tokenized words and named entities as features.

hltcoe-wordNER500¶

Participants | Proceedings | Input | Appendix

Run ID: hltcoe-wordNER500
Participant: hltcoe
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: b27df2df9708c21008c6cfaef0c62ef6
Run description: Support vector machine using tokenized words and named entities as features.

igpi2012-ner50_tuned¶

Participants | Input | Appendix

Run ID: igpi2012-ner50_tuned
Participant: igpi2012
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 2a605f481ad7ed8c9f379c8c5eaa2b48
Run description: Using the top 50 popular named entities to compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Each topic has its own threshold tuned from the annotated dataset.

igpi2012-ner_jaccard¶

Participants | Input | Appendix

Run ID: igpi2012-ner_jaccard
Participant: igpi2012
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 613947cf008a3c169b954ce9d789d434
Run description: Compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Select an arbitrary threshold. The naive method is used as a basedline to compare with the following runs.

LSIS-lsisRFAll¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisRFAll
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: 793b0fef8a02c83ebf7e9adefb8d91a0
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5

LSIS-lsisRFYes¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisRFYes
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: cec886b91247beccede1f9571ec9dc43
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

LSIS-lsisSRFAll¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisSRFAll
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: ef5bc9b175d3e6456cf75bae8688dee0
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5

LSIS-lsisSRFYes¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisSRFYes
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: c551fcad28c1708a2f559c7e35e34675
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

LSIS-lsisSys1¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisSys1
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: automatic
Task: main
MD5: 32f9601710d97179f17cde69c77c6479
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so ((s1*s2)/2) + 0.5.

LSIS-lsisSys2¶

Participants | Proceedings | Input | Appendix

Run ID: LSIS-lsisSys2
Participant: LSIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: automatic
Task: main
MD5: d5b7e1fcf62fceb27f90ff41c4f518b8
Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

PRIS-PRIS_Run_1¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_1
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: e6fbaf124ec7146852e64cb20d4bfc1f
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_400¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_400
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 7e7e7f0a60f8df1ae397a4497e9db5f4
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_500¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_500
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: ceb487611bbfd4f01b5dc2d6bbf4ab55
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_600¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_600
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 832e1fe45b71316e487128be19201e50
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_700¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_700
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 69ac6a15e62c757d2d560b58da84622d
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_800¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_800
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 2d3bfa799cab67bd112b50d7b2641e38
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_900¶

Participants | Proceedings | Input | Appendix

Run ID: PRIS-PRIS_Run_900
Participant: PRIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: f9bc188ca4324a415768657af80bb22f
Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

SCIAITeam-B1¶

Participants | Proceedings | Input | Appendix

Run ID: SCIAITeam-B1
Participant: SCIAITeam
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/15/2012
Type: automatic
Task: main
MD5: ad30eb0a33bd751b6a745379aa6cd02a
Run description: test

SCIAITeam-L2¶

Participants | Proceedings | Input | Appendix

Run ID: SCIAITeam-L2
Participant: SCIAITeam
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: c4a47116b4d2bc6c62f60a47503ee9ae
Run description: test

SCIAITeam-L3¶

Participants | Proceedings | Input | Appendix

Run ID: SCIAITeam-L3
Participant: SCIAITeam
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: da93635e8807c98ae384dd2705ac305a
Run description: test

SCIAITeam-W1¶

Participants | Proceedings | Input | Appendix

Run ID: SCIAITeam-W1
Participant: SCIAITeam
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: manual
Task: main
MD5: a99355adecdab74306014d383cbdce58
Run description: Wikipedia Query Expansion applied

udel_fang-UDInfoKBA_EX¶

Participants | Proceedings | Input | Appendix

Run ID: udel_fang-UDInfoKBA_EX
Participant: udel_fang
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: 5d7320d5d5e9dd40aeb8d1fbe456495d
Run description: If a document has an exact match with the query entity, the ranking score will be 1000. In other cases, the ranking score will be 0.

udel_fang-UDInfoKBA_WIKI1¶

Participants | Proceedings | Input | Appendix

Run ID: udel_fang-UDInfoKBA_WIKI1
Participant: udel_fang
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: abfa20cd6065d2c41c08b3df702c3e18
Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

udel_fang-UDInfoKBA_WIKI2¶

Participants | Proceedings | Input | Appendix

Run ID: udel_fang-UDInfoKBA_WIKI2
Participant: udel_fang
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: a646f55135a7ba0e1460b25fd4fded98
Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

udel_fang-UDInfoKBA_WIKI3¶

Participants | Proceedings | Input | Appendix

Run ID: udel_fang-UDInfoKBA_WIKI3
Participant: udel_fang
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/12/2012
Type: automatic
Task: main
MD5: eaa6f4381d488b839e82f8199ea60166
Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

uiucGSLIS-gslis_adaptive¶

Participants | Proceedings | Input | Appendix

Run ID: uiucGSLIS-gslis_adaptive
Participant: uiucGSLIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/10/2012
Type: automatic
Task: main
MD5: 9e935f7ecde49f7605b11ce447a8713a
Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs.

uiucGSLIS-gslis_mult¶

Participants | Proceedings | Input | Appendix

Run ID: uiucGSLIS-gslis_mult
Participant: uiucGSLIS
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/11/2012
Type: automatic
Task: main
MD5: 5a1ac23bdb36053613571e7879e146b8
Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs. Updates are multipled (independent probabilty model).

UMass_CIIR-FS_NV_6000¶

Participants | Proceedings | Input | Appendix

Run ID: UMass_CIIR-FS_NV_6000
Participant: UMass_CIIR
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Task: main
MD5: 3d9986d84ad22e2ed4891d94db393956
Run description: This run performs retrieval over the entire collection without wrt to time using entity name and simple variants. Galago sequential dependence based retrieval over entire document stream; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Statistics from complete collection. Retrieved 6000 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))

UMass_CIIR-PC_NV_1500¶

Participants | Proceedings | Input | Appendix

Run ID: UMass_CIIR-PC_NV_1500
Participant: UMass_CIIR
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: automatic
Task: main
MD5: 518276880eb0e39dda717a34e83cb48c
Run description: This run represents a state of the art retrieval approach using only the entity name and simple variants. Galago sequential dependence based retrieval over post-cutoff documents; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Background statistics only from pre-cutoff documents. Retrieved 1500 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))

UMass_CIIR-PC_RM10_1500¶

Participants | Proceedings | Input | Appendix

Run ID: UMass_CIIR-PC_RM10_1500
Participant: UMass_CIIR
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: automatic
Task: main
MD5: 70cb07bd3a0a71c4b70401f798350de2
Run description: This run represents a state of the art retrieval approach using relevance feedback from the training documents. Galago RM3 entity expansion model over post-cutoff documents; initial original query from PC_NV_1500. RM weights original query = 0.6, expanded unigrams=0.2, expanded entities=0.2. This RM incorporates entity concepts from extracted NER tags, [10-20] entities, 10 top weighted plus up to 10 entities from Wikipedia link neighborhood (incoming and outgoing topic names). Indri Query combine 0=0.6 1=0.2 2=0.2( combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n)) combine unigram_rmweights( unigram_1, ... unigram_n) combine entity_rmweights( seqdep(entity_1) ... seqdep(entity_n)) )

UMass_CIIR-PC_RM10_TACRL¶

Participants | Proceedings | Input | Appendix

Run ID: UMass_CIIR-PC_RM10_TACRL
Participant: UMass_CIIR
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/17/2012
Type: automatic
Task: main
MD5: cb9fed4a5d065c70363c4d60fe06d64d
Run description: This run applies a TAC entity linking approach to filter the stream of documents. For this approach, all documents returned from PC_RM10_1500 are converted into TAC EL queries. A supervised TAC EL ranker is applied with the topic entity as the candidate set. KBA documents are re-ranked by their linker score to the topic entity. The ranking model is a linear model optimized with Coordinate Ascent incorporating dozens of features including surface form and document similarity functions.

UMass_CIIR-PC_RM20_1500¶

Participants | Proceedings | Input | Appendix

Run ID: UMass_CIIR-PC_RM20_1500
Participant: UMass_CIIR
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/16/2012
Type: automatic
Task: main
MD5: 7350770fa43a2782a90d4d0b084a8f6e
Run description: An attempt to improve recall over PC_RM10_1500 by including more expansion terms. Same as PC_RM10_1500 with 20 instead of 10 expansion terms.

UvA-UvAbaseline¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvAbaseline
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/13/2012
Type: automatic
Task: main
MD5: 74d72791b326b1e12c27f5772a8a8b22
Run description: baseline 2012 run

UvA-UvAIncLearnHigh¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvAIncLearnHigh
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: 33ef7545e21c0f9d89c9fe5a49a7e1f1
Run description: Learning to rerank run, with incremental learning with high threshold

UvA-UvAIncLearnLow¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvAIncLearnLow
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: fdb9e15b5e531ee9f41d69fe5a183428
Run description: Learning to rerank run, with incremental learning with low threshold

UvA-UvAIncLearnT25¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvAIncLearnT25
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: 9d5fe0f1db5c342aa1cb42121203edaf
Run description: Learning to rerank run, with incremental learning on top 25 instances

UvA-UvAIncLearnT50¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvAIncLearnT50
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: 5a268c4223a0482ea1c65793909aa7e3
Run description: Learning to rerank run, with incremental learning on top 50 instances

UvA-UvALearning¶

Participants | Proceedings | Input | Appendix

Run ID: UvA-UvALearning
Participant: UvA
Track: Knowledge Base Acceleration
Year: 2012
Submission: 9/14/2012
Type: automatic
Task: main
MD5: cae7dfb9722113d7f5d7cb7959ad562a
Run description: UvALearning to rerank run