Runs - Knowledge Base Acceleration 2012¶
CWI-DISAMBIGUATOR¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-DISAMBIGUATOR
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/5/2012
- Type: automatic
- Task: main
- MD5:
2c9087cabc1b867fee83c5983a400276
- Run description: This method uses the words in the dbpedia page of the entities to disambiguate the ambiguous entities. Mainly, a documents is considered central if it contains the label of the dbpedia entity and at least one word that occur in the dbpedia page of this entity.
CWI-google_dic_1¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-google_dic_1
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
7b1362681c4f6bc7f8a8a1f8e6bb2110
- Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. However, entities whose max score is <0.01 are normalized by 0.01 to dicourage them from being equally competitive with others entities. The systems strength relevant+central
CWI-google_dic_2¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-google_dic_2
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
23f06f2f91bf34c167140c5a476fa75f
- Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.
CWI-google_dic_3¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-google_dic_3
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
71df0a95d3e6b889491aa0f5dc4a86d7
- Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.
CWI-google_strip_1¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-google_strip_1
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
268c5a7698522b90e40fb3599b2c9c58
- Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. Entities who highest Scores <0.01 are normalized by 0.01 to dicourage them from being equally competitive with other entities.
CWI-google_strip_2¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-google_strip_2
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
d960cedea6a30e2ded896b6f7ea0cf19
- Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000.
CWI-LANGUAGEMODEL¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-LANGUAGEMODEL
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
5577f8963df9a6bd4957a5fb501f1cbd
- Run description: ONLY CENTRAL.It uses a language model for each entity. The language models are built using only the central documents. Then those language models are used to rank the test documents. We compare each document with the perplexity measure. The scores are normalized between 0 and 1000. This method is used to detect only central documents.
CWI-LEARNING16000¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: CWI-LEARNING16000
- Participant: CWI
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 8/31/2012
- Type: automatic
- Task: main
- MD5:
1d639a9420cb65a85e879c8e9b4a09d4
- Run description: Find only central documents using a supervised approach. It uses a list of query strings learned from the trained data. Documents retrieve are those that exact match a string in this list. For each entity, a list of strings are used as a query. Those strings are equal the exact matched to the label of the entity on wikipedia plus 10 character before and after its occurrence on the trainning documents.
helsinki-disgraph¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: helsinki-disgraph
- Participant: helsinki
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
a96ad7377a819fa6f2763b19a95c1b1d
- Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.
helsinki-disgraph2¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: helsinki-disgraph2
- Participant: helsinki
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
c2adb664a5f7c60e8e83d311a9ea4ffa
- Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.
hltcoe-wordNER¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: hltcoe-wordNER
- Participant: hltcoe
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
304afc6ddd8f64cd1d08806f5b7e30e6
- Run description: Support vector machine using tokenized words and named entities as features.
hltcoe-wordNER500¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: hltcoe-wordNER500
- Participant: hltcoe
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
b27df2df9708c21008c6cfaef0c62ef6
- Run description: Support vector machine using tokenized words and named entities as features.
igpi2012-ner50_tuned¶
Participants
| Input
| Appendix
- Run ID: igpi2012-ner50_tuned
- Participant: igpi2012
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
2a605f481ad7ed8c9f379c8c5eaa2b48
- Run description: Using the top 50 popular named entities to compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Each topic has its own threshold tuned from the annotated dataset.
igpi2012-ner_jaccard¶
Participants
| Input
| Appendix
- Run ID: igpi2012-ner_jaccard
- Participant: igpi2012
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
613947cf008a3c169b954ce9d789d434
- Run description: Compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Select an arbitrary threshold. The naive method is used as a basedline to compare with the following runs.
LSIS-lsisRFAll¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisRFAll
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
793b0fef8a02c83ebf7e9adefb8d91a0
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5
LSIS-lsisRFYes¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisRFYes
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
cec886b91247beccede1f9571ec9dc43
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.
LSIS-lsisSRFAll¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisSRFAll
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
ef5bc9b175d3e6456cf75bae8688dee0
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5
LSIS-lsisSRFYes¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisSRFYes
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
c551fcad28c1708a2f559c7e35e34675
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.
LSIS-lsisSys1¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisSys1
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: automatic
- Task: main
- MD5:
32f9601710d97179f17cde69c77c6479
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so ((s1*s2)/2) + 0.5.
LSIS-lsisSys2¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: LSIS-lsisSys2
- Participant: LSIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: automatic
- Task: main
- MD5:
d5b7e1fcf62fceb27f90ff41c4f518b8
- Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.
PRIS-PRIS_Run_1¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_1
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
e6fbaf124ec7146852e64cb20d4bfc1f
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_400¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_400
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
7e7e7f0a60f8df1ae397a4497e9db5f4
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_500¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_500
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
ceb487611bbfd4f01b5dc2d6bbf4ab55
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_600¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_600
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
832e1fe45b71316e487128be19201e50
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_700¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_700
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
69ac6a15e62c757d2d560b58da84622d
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_800¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_800
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
2d3bfa799cab67bd112b50d7b2641e38
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
PRIS-PRIS_Run_900¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: PRIS-PRIS_Run_900
- Participant: PRIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
f9bc188ca4324a415768657af80bb22f
- Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.
SCIAITeam-B1¶
Participants
| Input
| Appendix
- Run ID: SCIAITeam-B1
- Participant: SCIAITeam
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/15/2012
- Type: automatic
- Task: main
- MD5:
ad30eb0a33bd751b6a745379aa6cd02a
- Run description: test
SCIAITeam-L2¶
Participants
| Input
| Appendix
- Run ID: SCIAITeam-L2
- Participant: SCIAITeam
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
c4a47116b4d2bc6c62f60a47503ee9ae
- Run description: test
SCIAITeam-L3¶
Participants
| Input
| Appendix
- Run ID: SCIAITeam-L3
- Participant: SCIAITeam
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
da93635e8807c98ae384dd2705ac305a
- Run description: test
SCIAITeam-W1¶
Participants
| Input
| Appendix
- Run ID: SCIAITeam-W1
- Participant: SCIAITeam
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: manual
- Task: main
- MD5:
a99355adecdab74306014d383cbdce58
- Run description: Wikipedia Query Expansion applied
udel_fang-UDInfoKBA_EX¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: udel_fang-UDInfoKBA_EX
- Participant: udel_fang
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
5d7320d5d5e9dd40aeb8d1fbe456495d
- Run description: If a document has an exact match with the query entity, the ranking score will be 1000. In other cases, the ranking score will be 0.
udel_fang-UDInfoKBA_WIKI1¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: udel_fang-UDInfoKBA_WIKI1
- Participant: udel_fang
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
abfa20cd6065d2c41c08b3df702c3e18
- Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score
udel_fang-UDInfoKBA_WIKI2¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: udel_fang-UDInfoKBA_WIKI2
- Participant: udel_fang
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
a646f55135a7ba0e1460b25fd4fded98
- Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score
udel_fang-UDInfoKBA_WIKI3¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: udel_fang-UDInfoKBA_WIKI3
- Participant: udel_fang
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/12/2012
- Type: automatic
- Task: main
- MD5:
eaa6f4381d488b839e82f8199ea60166
- Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score
uiucGSLIS-gslis_adaptive¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: uiucGSLIS-gslis_adaptive
- Participant: uiucGSLIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/10/2012
- Type: automatic
- Task: main
- MD5:
9e935f7ecde49f7605b11ce447a8713a
- Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs.
uiucGSLIS-gslis_mult¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: uiucGSLIS-gslis_mult
- Participant: uiucGSLIS
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/11/2012
- Type: automatic
- Task: main
- MD5:
5a1ac23bdb36053613571e7879e146b8
- Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs. Updates are multipled (independent probabilty model).
UMass_CIIR-FS_NV_6000¶
Participants
| Input
| Appendix
- Run ID: UMass_CIIR-FS_NV_6000
- Participant: UMass_CIIR
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Task: main
- MD5:
3d9986d84ad22e2ed4891d94db393956
- Run description: This run performs retrieval over the entire collection without wrt to time using entity name and simple variants. Galago sequential dependence based retrieval over entire document stream; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Statistics from complete collection. Retrieved 6000 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))
UMass_CIIR-PC_NV_1500¶
Participants
| Input
| Appendix
- Run ID: UMass_CIIR-PC_NV_1500
- Participant: UMass_CIIR
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: automatic
- Task: main
- MD5:
518276880eb0e39dda717a34e83cb48c
- Run description: This run represents a state of the art retrieval approach using only the entity name and simple variants. Galago sequential dependence based retrieval over post-cutoff documents; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Background statistics only from pre-cutoff documents. Retrieved 1500 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))
UMass_CIIR-PC_RM10_1500¶
Participants
| Input
| Appendix
- Run ID: UMass_CIIR-PC_RM10_1500
- Participant: UMass_CIIR
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: automatic
- Task: main
- MD5:
70cb07bd3a0a71c4b70401f798350de2
- Run description: This run represents a state of the art retrieval approach using relevance feedback from the training documents. Galago RM3 entity expansion model over post-cutoff documents; initial original query from PC_NV_1500. RM weights original query = 0.6, expanded unigrams=0.2, expanded entities=0.2. This RM incorporates entity concepts from extracted NER tags, [10-20] entities, 10 top weighted plus up to 10 entities from Wikipedia link neighborhood (incoming and outgoing topic names). Indri Query combine 0=0.6 1=0.2 2=0.2( combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n)) combine unigram_rmweights( unigram_1, ... unigram_n) combine entity_rmweights( seqdep(entity_1) ... seqdep(entity_n)) )
UMass_CIIR-PC_RM10_TACRL¶
Participants
| Input
| Appendix
- Run ID: UMass_CIIR-PC_RM10_TACRL
- Participant: UMass_CIIR
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/17/2012
- Type: automatic
- Task: main
- MD5:
cb9fed4a5d065c70363c4d60fe06d64d
- Run description: This run applies a TAC entity linking approach to filter the stream of documents. For this approach, all documents returned from PC_RM10_1500 are converted into TAC EL queries. A supervised TAC EL ranker is applied with the topic entity as the candidate set. KBA documents are re-ranked by their linker score to the topic entity. The ranking model is a linear model optimized with Coordinate Ascent incorporating dozens of features including surface form and document similarity functions.
UMass_CIIR-PC_RM20_1500¶
Participants
| Input
| Appendix
- Run ID: UMass_CIIR-PC_RM20_1500
- Participant: UMass_CIIR
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/16/2012
- Type: automatic
- Task: main
- MD5:
7350770fa43a2782a90d4d0b084a8f6e
- Run description: An attempt to improve recall over PC_RM10_1500 by including more expansion terms. Same as PC_RM10_1500 with 20 instead of 10 expansion terms.
UvA-UvAbaseline¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvAbaseline
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/13/2012
- Type: automatic
- Task: main
- MD5:
74d72791b326b1e12c27f5772a8a8b22
- Run description: baseline 2012 run
UvA-UvAIncLearnHigh¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvAIncLearnHigh
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
33ef7545e21c0f9d89c9fe5a49a7e1f1
- Run description: Learning to rerank run, with incremental learning with high threshold
UvA-UvAIncLearnLow¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvAIncLearnLow
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
fdb9e15b5e531ee9f41d69fe5a183428
- Run description: Learning to rerank run, with incremental learning with low threshold
UvA-UvAIncLearnT25¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvAIncLearnT25
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
9d5fe0f1db5c342aa1cb42121203edaf
- Run description: Learning to rerank run, with incremental learning on top 25 instances
UvA-UvAIncLearnT50¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvAIncLearnT50
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
5a268c4223a0482ea1c65793909aa7e3
- Run description: Learning to rerank run, with incremental learning on top 50 instances
UvA-UvALearning¶
Participants
| Proceedings
| Input
| Appendix
- Run ID: UvA-UvALearning
- Participant: UvA
- Track: Knowledge Base Acceleration
- Year: 2012
- Submission: 9/14/2012
- Type: automatic
- Task: main
- MD5:
cae7dfb9722113d7f5d7cb7959ad562a
- Run description: UvALearning to rerank run