Skip to content

Runs - Knowledge Base Acceleration 2012

CWI-DISAMBIGUATOR

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-DISAMBIGUATOR
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/5/2012
  • Type: automatic
  • Task: main
  • MD5: 2c9087cabc1b867fee83c5983a400276
  • Run description: This method uses the words in the dbpedia page of the entities to disambiguate the ambiguous entities. Mainly, a documents is considered central if it contains the label of the dbpedia entity and at least one word that occur in the dbpedia page of this entity.

CWI-google_dic_1

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-google_dic_1
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 7b1362681c4f6bc7f8a8a1f8e6bb2110
  • Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. However, entities whose max score is <0.01 are normalized by 0.01 to dicourage them from being equally competitive with others entities. The systems strength relevant+central

CWI-google_dic_2

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-google_dic_2
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 23f06f2f91bf34c167140c5a476fa75f
  • Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.

CWI-google_dic_3

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-google_dic_3
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 71df0a95d3e6b889491aa0f5dc4a86d7
  • Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It does not strip nor lowercase the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. The systems strength is in relevant+central.

CWI-google_strip_1

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-google_strip_1
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 268c5a7698522b90e40fb3599b2c9c58
  • Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000. Entities who highest Scores <0.01 are normalized by 0.01 to dicourage them from being equally competitive with other entities.

CWI-google_strip_2

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-google_strip_2
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: d960cedea6a30e2ded896b6f7ea0cf19
  • Run description: This system uses google cross-lingual dictionary's strings and probabilities to represent the entities and searches the documents for a match. It strips punctuation and lowercases the strings and docs. The google cross-lingual dictionary has two probabilities P(entity|string) and P(string|entity). The system uses a multiplication of both to score the doc-entity pair. if many strings are found to match, the highest score is chosen as the score for the doc-entity pair. The score is then normalized by the highest score per entity and multiplied by 1000.

CWI-LANGUAGEMODEL

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-LANGUAGEMODEL
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 5577f8963df9a6bd4957a5fb501f1cbd
  • Run description: ONLY CENTRAL.It uses a language model for each entity. The language models are built using only the central documents. Then those language models are used to rank the test documents. We compare each document with the perplexity measure. The scores are normalized between 0 and 1000. This method is used to detect only central documents.

CWI-LEARNING16000

Participants | Proceedings | Input | Appendix

  • Run ID: CWI-LEARNING16000
  • Participant: CWI
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 8/31/2012
  • Type: automatic
  • Task: main
  • MD5: 1d639a9420cb65a85e879c8e9b4a09d4
  • Run description: Find only central documents using a supervised approach. It uses a list of query strings learned from the trained data. Documents retrieve are those that exact match a string in this list. For each entity, a list of strings are used as a query. Those strings are equal the exact matched to the label of the entity on wikipedia plus 10 character before and after its occurrence on the trainning documents.

helsinki-disgraph

Participants | Proceedings | Input | Appendix

  • Run ID: helsinki-disgraph
  • Participant: helsinki
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: a96ad7377a819fa6f2763b19a95c1b1d
  • Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.

helsinki-disgraph2

Participants | Proceedings | Input | Appendix

  • Run ID: helsinki-disgraph2
  • Participant: helsinki
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: c2adb664a5f7c60e8e83d311a9ea4ffa
  • Run description: Relation to named entities is detected by looking at the overlap of named entity graphs and document word collocation graph.

hltcoe-wordNER

Participants | Proceedings | Input | Appendix

  • Run ID: hltcoe-wordNER
  • Participant: hltcoe
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 304afc6ddd8f64cd1d08806f5b7e30e6
  • Run description: Support vector machine using tokenized words and named entities as features.

hltcoe-wordNER500

Participants | Proceedings | Input | Appendix

  • Run ID: hltcoe-wordNER500
  • Participant: hltcoe
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: b27df2df9708c21008c6cfaef0c62ef6
  • Run description: Support vector machine using tokenized words and named entities as features.

igpi2012-ner50_tuned

Participants | Input | Appendix

  • Run ID: igpi2012-ner50_tuned
  • Participant: igpi2012
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 2a605f481ad7ed8c9f379c8c5eaa2b48
  • Run description: Using the top 50 popular named entities to compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Each topic has its own threshold tuned from the annotated dataset.

igpi2012-ner_jaccard

Participants | Input | Appendix

  • Run ID: igpi2012-ner_jaccard
  • Participant: igpi2012
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 613947cf008a3c169b954ce9d789d434
  • Run description: Compute the jaccard coefficient between the entity list for each document and the entity list for the positive doucments from the annotated set. Select an arbitrary threshold. The naive method is used as a basedline to compare with the following runs.

LSIS-lsisRFAll

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisRFAll
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: 793b0fef8a02c83ebf7e9adefb8d91a0
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5

LSIS-lsisRFYes

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisRFYes
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: cec886b91247beccede1f9571ec9dc43
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

LSIS-lsisSRFAll

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisSRFAll
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: ef5bc9b175d3e6456cf75bae8688dee0
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1 if s2 < 0.5 or ((s1*s2)/2) + 0.5 if s2 >= 0.5

LSIS-lsisSRFYes

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisSRFYes
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: c551fcad28c1708a2f559c7e35e34675
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

LSIS-lsisSys1

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisSys1
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: automatic
  • Task: main
  • MD5: 32f9601710d97179f17cde69c77c6479
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so ((s1*s2)/2) + 0.5.

LSIS-lsisSys2

Participants | Proceedings | Input | Appendix

  • Run ID: LSIS-lsisSys2
  • Participant: LSIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: automatic
  • Task: main
  • MD5: d5b7e1fcf62fceb27f90ff41c4f518b8
  • Run description: With the help of the wikipedia web pages, variant names have been found for each topics which in some cases may help to disambiguate the topic itself (e.g., Basic Element company and music group). Each single hour of the corpus have been indexed separately in order to based our documents retrieval on a real stream. The following steps are done for each topic As the process go through the stream, each index is queried with the topic's url_name as well as the variants. A queue is build with a day granularity and can contains up to seven days. Then for each document, statistics are computed based on what can be found in the document (e.g, number of wikipedia's related entity coming from the wikipedia page), what can be found in the current day (number of document found that particular day), and what has been seen on the previous days currently in the queue (number of mentions in title, in previous documents,). Those statistics are used for training a RandomCommittee classifier which uses multiple RandomForrest classifier as a committee to build a deep trees that consider all features in all possible cases. Two classifications are done as so, one to separate garbage from relevant and Central (c1) and give a score (s1), the other (c2) one to choose between relevant and central (s2). For this run, s2 is computed as so s1*s2.

PRIS-PRIS_Run_1

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_1
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: e6fbaf124ec7146852e64cb20d4bfc1f
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_400

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_400
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 7e7e7f0a60f8df1ae397a4497e9db5f4
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_500

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_500
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: ceb487611bbfd4f01b5dc2d6bbf4ab55
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_600

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_600
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 832e1fe45b71316e487128be19201e50
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_700

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_700
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 69ac6a15e62c757d2d560b58da84622d
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_800

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_800
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 2d3bfa799cab67bd112b50d7b2641e38
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

PRIS-PRIS_Run_900

Participants | Proceedings | Input | Appendix

  • Run ID: PRIS-PRIS_Run_900
  • Participant: PRIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: f9bc188ca4324a415768657af80bb22f
  • Run description: Relevance Feedback is first applied to our system according to the annotation data. Then Jaccard coefficient weighting scheme is used to calculate the relevance between streaming documents and the topic entities.

SCIAITeam-B1

Participants | Input | Appendix

  • Run ID: SCIAITeam-B1
  • Participant: SCIAITeam
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/15/2012
  • Type: automatic
  • Task: main
  • MD5: ad30eb0a33bd751b6a745379aa6cd02a
  • Run description: test

SCIAITeam-L2

Participants | Input | Appendix

  • Run ID: SCIAITeam-L2
  • Participant: SCIAITeam
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: c4a47116b4d2bc6c62f60a47503ee9ae
  • Run description: test

SCIAITeam-L3

Participants | Input | Appendix

  • Run ID: SCIAITeam-L3
  • Participant: SCIAITeam
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: da93635e8807c98ae384dd2705ac305a
  • Run description: test

SCIAITeam-W1

Participants | Input | Appendix

  • Run ID: SCIAITeam-W1
  • Participant: SCIAITeam
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: manual
  • Task: main
  • MD5: a99355adecdab74306014d383cbdce58
  • Run description: Wikipedia Query Expansion applied

udel_fang-UDInfoKBA_EX

Participants | Proceedings | Input | Appendix

  • Run ID: udel_fang-UDInfoKBA_EX
  • Participant: udel_fang
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: 5d7320d5d5e9dd40aeb8d1fbe456495d
  • Run description: If a document has an exact match with the query entity, the ranking score will be 1000. In other cases, the ranking score will be 0.

udel_fang-UDInfoKBA_WIKI1

Participants | Proceedings | Input | Appendix

  • Run ID: udel_fang-UDInfoKBA_WIKI1
  • Participant: udel_fang
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: abfa20cd6065d2c41c08b3df702c3e18
  • Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

udel_fang-UDInfoKBA_WIKI2

Participants | Proceedings | Input | Appendix

  • Run ID: udel_fang-UDInfoKBA_WIKI2
  • Participant: udel_fang
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: a646f55135a7ba0e1460b25fd4fded98
  • Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

udel_fang-UDInfoKBA_WIKI3

Participants | Proceedings | Input | Appendix

  • Run ID: udel_fang-UDInfoKBA_WIKI3
  • Participant: udel_fang
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/12/2012
  • Type: automatic
  • Task: main
  • MD5: eaa6f4381d488b839e82f8199ea60166
  • Run description: We first filter out all the streaming documents which have exact match with the query entity. For each of the query entity, we extract the entities from the internal link on its associated WikiPedia page, which means such entities also have WikiPedia page. These entities would be considered as the related entities to the query entity. For each of the filtered document, we then count the occurrences of related entities from WikiPedia, and the scores reflect the occurrences. Streaming documents are filtered based on some pre-defined threshold of the score

uiucGSLIS-gslis_adaptive

Participants | Proceedings | Input | Appendix

  • Run ID: uiucGSLIS-gslis_adaptive
  • Participant: uiucGSLIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/10/2012
  • Type: automatic
  • Task: main
  • MD5: 9e935f7ecde49f7605b11ce447a8713a
  • Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs.

uiucGSLIS-gslis_mult

Participants | Proceedings | Input | Appendix

  • Run ID: uiucGSLIS-gslis_mult
  • Participant: uiucGSLIS
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/11/2012
  • Type: automatic
  • Task: main
  • MD5: 5a1ac23bdb36053613571e7879e146b8
  • Run description: Initial queries consist of wikitext extracted from each entitys history. We impose a document prior favoring docs with high in-link count. Only English docs with near-exact name match on entities are ranked. Query is updated monthly, as which point the weights of features (but not features themselves) are recalculated based on previously retrieved docs. Updates are multipled (independent probabilty model).

UMass_CIIR-FS_NV_6000

Participants | Input | Appendix

  • Run ID: UMass_CIIR-FS_NV_6000
  • Participant: UMass_CIIR
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Task: main
  • MD5: 3d9986d84ad22e2ed4891d94db393956
  • Run description: This run performs retrieval over the entire collection without wrt to time using entity name and simple variants. Galago sequential dependence based retrieval over entire document stream; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Statistics from complete collection. Retrieved 6000 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))

UMass_CIIR-PC_NV_1500

Participants | Input | Appendix

  • Run ID: UMass_CIIR-PC_NV_1500
  • Participant: UMass_CIIR
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: automatic
  • Task: main
  • MD5: 518276880eb0e39dda717a34e83cb48c
  • Run description: This run represents a state of the art retrieval approach using only the entity name and simple variants. Galago sequential dependence based retrieval over post-cutoff documents; dirichlet smoothing, mu=2000, seq. dep. params uniw=0.29, odw=0.21, uww=0.50. Background statistics only from pre-cutoff documents. Retrieved 1500 documents per topic across all hours. Combines original topic name with name variants from Wiki redirects and wiki anchor text. Indri Query combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n))

UMass_CIIR-PC_RM10_1500

Participants | Input | Appendix

  • Run ID: UMass_CIIR-PC_RM10_1500
  • Participant: UMass_CIIR
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: automatic
  • Task: main
  • MD5: 70cb07bd3a0a71c4b70401f798350de2
  • Run description: This run represents a state of the art retrieval approach using relevance feedback from the training documents. Galago RM3 entity expansion model over post-cutoff documents; initial original query from PC_NV_1500. RM weights original query = 0.6, expanded unigrams=0.2, expanded entities=0.2. This RM incorporates entity concepts from extracted NER tags, [10-20] entities, 10 top weighted plus up to 10 entities from Wikipedia link neighborhood (incoming and outgoing topic names). Indri Query combine 0=0.6 1=0.2 2=0.2( combine( seqdep(topic_name) combine(seqdep(name_variant_1) ... seqdep(name_variant_n)) combine unigram_rmweights( unigram_1, ... unigram_n) combine entity_rmweights( seqdep(entity_1) ... seqdep(entity_n)) )

UMass_CIIR-PC_RM10_TACRL

Participants | Input | Appendix

  • Run ID: UMass_CIIR-PC_RM10_TACRL
  • Participant: UMass_CIIR
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/17/2012
  • Type: automatic
  • Task: main
  • MD5: cb9fed4a5d065c70363c4d60fe06d64d
  • Run description: This run applies a TAC entity linking approach to filter the stream of documents. For this approach, all documents returned from PC_RM10_1500 are converted into TAC EL queries. A supervised TAC EL ranker is applied with the topic entity as the candidate set. KBA documents are re-ranked by their linker score to the topic entity. The ranking model is a linear model optimized with Coordinate Ascent incorporating dozens of features including surface form and document similarity functions.

UMass_CIIR-PC_RM20_1500

Participants | Input | Appendix

  • Run ID: UMass_CIIR-PC_RM20_1500
  • Participant: UMass_CIIR
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/16/2012
  • Type: automatic
  • Task: main
  • MD5: 7350770fa43a2782a90d4d0b084a8f6e
  • Run description: An attempt to improve recall over PC_RM10_1500 by including more expansion terms. Same as PC_RM10_1500 with 20 instead of 10 expansion terms.

UvA-UvAbaseline

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvAbaseline
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/13/2012
  • Type: automatic
  • Task: main
  • MD5: 74d72791b326b1e12c27f5772a8a8b22
  • Run description: baseline 2012 run

UvA-UvAIncLearnHigh

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvAIncLearnHigh
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: 33ef7545e21c0f9d89c9fe5a49a7e1f1
  • Run description: Learning to rerank run, with incremental learning with high threshold

UvA-UvAIncLearnLow

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvAIncLearnLow
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: fdb9e15b5e531ee9f41d69fe5a183428
  • Run description: Learning to rerank run, with incremental learning with low threshold

UvA-UvAIncLearnT25

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvAIncLearnT25
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: 9d5fe0f1db5c342aa1cb42121203edaf
  • Run description: Learning to rerank run, with incremental learning on top 25 instances

UvA-UvAIncLearnT50

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvAIncLearnT50
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: 5a268c4223a0482ea1c65793909aa7e3
  • Run description: Learning to rerank run, with incremental learning on top 50 instances

UvA-UvALearning

Participants | Proceedings | Input | Appendix

  • Run ID: UvA-UvALearning
  • Participant: UvA
  • Track: Knowledge Base Acceleration
  • Year: 2012
  • Submission: 9/14/2012
  • Type: automatic
  • Task: main
  • MD5: cae7dfb9722113d7f5d7cb7959ad562a
  • Run description: UvALearning to rerank run