Run description: we developed a seq2seq model to map paragraphs to smaller set and then ranked the the set based on the similarity of query and the set.
Run description: sequential dependence model with restricted set of bigrams (it can be regarded as a special case of the "Latent Concept Expansion" model) WikiTreeLM (or generally WikiTreeSDM) selects a sequence of sections (i.e. a path in the wiki tree) of the corresponding Wikipedia article for each candidate paragraph, which maximizes a Dirchlet prior smoothed based scoring function. parameter tunning on benchmarkY1test automatic qrels
Run description: SDM with restricted set of bigrams (it can be regarded as a special case of the "Latent Concept Expansion" model). WikiTreeLM (or generally, WikiTreeSDM) selects a sequence of sections of the corresponding Wikipedia article for each paragraph that maximizes a Dirchlet prior smoothed based scoring function. (consider weighted sum of concept frequencies/document lengths of each section) Entity LM: a simple entity language model. Use TAGME for entity linking on Wikipedia 2015-10. Identified query entities not in unprocessedAllButBenchmark are removed.
Run description: Semantic Query Expansion with the k-nn of each query term + bm25 and boosting specific query terms, terms appearing on lower levels -> higher weights. We obtain the nearest neighbors from the GloVe Embedding Space. K=10
Run description: Semantic Query Expansion with the k-nn of each query term + bm25 and boosting specific query terms, terms appearing on lower levels -> higher weights. We obtain the nearest neighbors from the GloVe Embedding Space. K=20
Run description: Semantic Query Expansion with the k-nn of each query term + bm25 and boosting specific query terms, terms appearing on lower levels -> higher weights. We obtain the nearest neighbors from the GloVe Embedding Space. K=20
Run description: In this run, we want to investigate entity embeddings in the task of passage ranking. Each query and each document has an entity representation. Documents are represented as the average of entity embeddings within the passage. Queries have different representations including the complete outline and the subqueries. The similarity between query representations and document representation is used as features for learning to rank model.
Run description: PACRR neural ranking architecture modified with heading independence, heading frequency contextual vector, and expanded query terms by heading
Run description: Lucene is the underlying retrieval engine. We train 20 query reformulators from Nogueira and Cho (2017) on random disjoint partitions of the training set. For each query, each reformulator produces a list of ranked documents. We re-rank the union of these 20 lists using a simple ranking model that scores each query-document pair using a feed-forward neural network whose input is the concatenation of the average word embeddings of the query and document. To further improve the performance of the system, we train an ensemble of 10 ranking models whose network architectures are randomly chosen. For each query, we re-rank the union of the 10 lists produced by the 10 ranking models using the best ranking model in the ensemble.
Run description: Lucene is the underlying retrieval engine. We train 20 query reformulators from Nogueira and Cho (2017) on random disjoint partitions of the training set. For each query, each reformulator produces a list of ranked documents. We re-rank the union of these 20 lists using a simple ranking model that scores each query-document pair using a feed-forward neural network whose input is the concatenation of the average word embeddings of the query and document. To further improve the performance of the system, we train an ensemble of 5 ranking models whose network architectures are randomly chosen. For each query, we re-rank the union of the 5 lists produced by the 5 ranking models using the best ranking model in the ensemble.
Run description: Lucene is the underlying retrieval engine. We train 20 query reformulators from Nogueira and Cho (2017) on random disjoint partitions of the training set. For each query, each reformulator produces a list of ranked documents. We re-rank the union of these 20 lists using a simple ranking model that scores each query-document pair using a feed-forward neural network whose input is the concatenation of the average word embeddings of the query and document. To further improve the performance of the system, we train an ensemble of 11 ranking models whose network architectures are randomly chosen. For each query, we re-rank the union of the 11 lists produced by the 11 ranking models using the best ranking model in the ensemble.
Run description: Trained on benchmarkt1train - tree.entity.qrels extended entity links with DBpedia Spotlight (retaining only links to entities that already have a link somewhere on the article) best variant selected on benchmarkY1test - extended tree.entity.qrels Different indexes for retrieving entities and paragraphs are built from allButBenchmark and paragraphCorpus. Rankings of entities provide a feature vector for nodes, Rankings of paragraphs provide a feature vector for edges. A specialized Learn-to-walk algorithm is used to train how node and edge features are best combined to obtain best degree centrality rankings. The Learn-to-walk is trained with mini-batched coordinate ascent to optimize MAP of entity rankings
Run description: Four types of indices were made using Lucene v7.2.1: aspect, entity, page and paragraph.For each index, features were calculated using various combination of query-level,retrieval-model, expansion-model,and analyzer as follows: query-level: section path, all retrieval-model: BM25, Query Liklihood expansion-model: Entity Context Model analyzer: english Different indexes for retrieving entities and paragraphs are built from allButBenchmark and paragraphCorpus to obtain 10 rankings over Wikipages or paragraphs. Using the entity context model, we build entity relevance models (rankings over entity ids) from the pages/paragraphs, which are used as a ranking of entities. The particular run files are created by using a section path query to retrieve from these indices with BM25 and Querylikelihood. Additionally, paragraphs are retrieved by building a query from all headings of the topic page. These were then combined using a learning-to-rank and a combined ranking was obtained. This model is trained on benchmarkY1-train, using the document tree ground truth. The rankings were annotated with the highest ranked paragraph from an additional paragraph ranking as described below. PARAGRAPH FEATURES Fielded BM25: Weighted combination of unigram, bigram, and windowed bigram likelihood from the query to the document using BM25 algorithm. SDM: Standard SDM model Weighted Section Fielded BM25: Weighted combination of the leaf of a heading and the entire heading. Both scored with Fielded BM25. Expanded Bigrams: Using the rank score obtained from Fielded BM25, create a distribution over likely bigrams given unigrams in the query. Select the top 5 bigrams, add them to the original query, and rescore uses Fielded BM25. ENTITY -> PARAGRAPH FEATURES The following are entity features (scoring relevance of entity given query) that have been turned into paragraph features. Paragraphs are scored by integrating over the ranking scores of entities that they link to, given the entity feature. Link Freqeuncy: Each entity is scored according to the frequency that it is linked to by the candidate document (uses entity links). The following stats are parsed from v2.1 unprocessedAllButBenchmark: Categories: Unigram model of the enwiki categories of an entity. Entity Disambiguations: Unigram model of the disambiguation links of an entity Entity Inlinks: Unigram model of the disambiguation inlinks of an entity Entity Outlinks: Unigram model of the disambiguation outlinks of an entity Entity Redirects: Unigram model of the disambiguation redirect links of an entity Entity Sections: Unigram model of the section paths contained in the entity page. In addition, an "entity context" model is used. For each paragraph that an entity occurs (within the unprocessedAllButBenchmark v2.1), the surrounding unigrams, bigrams, and windowed bigrams are indexed. These -grams, along with the entity's name, are stored as pseudodocuments in an "entity context index" to be queried. Context Unigram: Query is tokenized into unigrams and used to query entity context database. Context Bigram: Query is tokenized into bigrams and used to query entity context database. Context Windowed Bigram: Query is tokenized into windowed bigram and used to query entity context database. TRAINING This model is trained on benchmarkY1-train, using both the document tree qrels and the entity tree qrels (because the document features can be transformed into entity features and vice versa, this allows us to use both training examples).
Run description: Creating entity runs, then annotating entities with highest ranked paragraphs in a secondary ranking that contain this entity. Entity Ranking: ENTITY FEATURES Link Frequency: Each entity is scored according to the frequency that it is linked to by the candidate paragraphs (uses entity links). The following stats are parsed from v2.1 unprocessedAllButBenchmark: Categories: Unigram model of the enwiki categories of an entity. Entity Disambiguations: Unigram model of the disambiguation links of an entity Entity Inlinks: Unigram model of the disambiguation inlinks of an entity Entity Outlinks: Unigram model of the disambiguation outlinks of an entity Entity Redirects: Unigram model of the disambiguation redirect links of an entity Entity Sections: Unigram model of the section paths contained in the entity page. In addition, an "entity context" model is used. For each paragraph that an entity occurs (within the unprocessedAllButBenchmark v2.1), the surrounding unigrams, bigrams, and windowed bigrams are indexed. These -grams, along with the entity's name, are stored as pseudodocuments in an "entity context index" to be queried. Context Unigram: Query is tokenized into unigrams and used to query entity context database. Context Bigram: Query is tokenized into bigrams and used to query entity context database. Context Windowed Bigram: Query is tokenized into windowed bigram and used to query entity context database. PARAGRAPH -> ENTITY FEATURES The following are features that score the relevance of a paragraph given a query. These are turned into entity features by integrating over the scores of each paragraph that an entity links to. Fielded BM25: Weighted combination of unigram, bigram, and windowed bigram likelihood from the query to the document using BM25 algorithm. SDM: Standard SDM model Weighted Section Fielded BM25: Weighted combination of the leaf of a heading and the entire heading. Both scored with Fielded BM25. Expanded Bigrams: Using the rank score obtained from Fielded BM25, create a distribution over likely bigrams given unigrams in the query. Select the top 5 bigrams, add them to the original query, and rescore uses Fielded BM25. TRAINING This model is trained on benchmarkY1-train, using the document tree qrels. Paragraph Ranking: Fielded BM25: Weighted combination of unigram, bigram, and windowed bigram likelihood from the query to the document using BM25 algorithm. SDM: Standard SDM model Weighted Section Fielded BM25: Weighted combination of the leaf of a heading and the entire heading. Both scored with Fielded BM25. Expanded Bigrams: Using the rank score obtained from Fielded BM25, create a distribution over likely bigrams given unigrams in the query. Select the top 5 bigrams, add them to the original query, and rescore uses Fielded BM25. ENTITY -> PARAGRAPH FEATURES The following are entity features (scoring relevance of entity given query) that have been turned into paragraph features. Paragraphs are scored by integrating over the ranking scores of entities that they link to, given the entity feature. Link Freqeuncy: Each entity is scored according to the frequency that it is linked to by the candidate document (uses entity links). The following stats are parsed from v2.1 unprocessedAllButBenchmark: Categories: Unigram model of the enwiki categories of an entity. Entity Disambiguations: Unigram model of the disambiguation links of an entity Entity Inlinks: Unigram model of the disambiguation inlinks of an entity Entity Outlinks: Unigram model of the disambiguation outlinks of an entity Entity Redirects: Unigram model of the disambiguation redirect links of an entity Entity Sections: Unigram model of the section paths contained in the entity page. In addition, an "entity context" model is used. For each paragraph that an entity occurs (within the unprocessedAllButBenchmark v2.1), the surrounding unigrams, bigrams, and windowed bigrams are indexed. These -grams, along with the entity's name, are stored as pseudodocuments in an "entity context index" to be queried. Context Unigram: Query is tokenized into unigrams and used to query entity context database. Context Bigram: Query is tokenized into bigrams and used to query entity context database. Context Windowed Bigram: Query is tokenized into windowed bigram and used to query entity context database. TRAINING This model is trained on benchmarkY1-train, using both the document tree qrels and the entity tree qrels (because the document features can be transformed into entity features and vice versa, this allows us to use both training examples).
Run description: Features: - BM25 (no expansion) - BM25 with RM3 expansion - Query likelihood (no expansion) - Query likelihood with RM3 expansion They are combined using learning to rank; RankLib with coordinate ascent optimized on map method is trained with data provided as benchmarkY1train External resources: Ranking: BM25, query likelihood (Lucene 7) Description: five fold cross validation is used with learning to rank to produce the training map. the full BY1train dataset is used with learning to rank to produce the runs on BY1test and BY2test. also in each run files used for combining using l2r, the retrieval scores are normalized which means for each query, scores are divided by the maximum score for that query.
Run description: PARAGRAPH FEATURES Fielded BM25: Weighted combination of unigram, bigram, and windowed bigram likelihood from the query to the document using BM25 algorithm. SDM: Standard SDM model Weighted Section Fielded BM25: Weighted combination of the leaf of a heading and the entire heading. Both scored with Fielded BM25. Expanded Bigrams: Using the rank score obtained from Fielded BM25, create a distribution over likely bigrams given unigrams in the query. Select the top 5 bigrams, add them to the original query, and rescore uses Fielded BM25. ENTITY -> PARAGRAPH FEATURES The following are entity features (scoring relevance of entity given query) that have been turned into paragraph features. Paragraphs are scored by integrating over the ranking scores of entities that they link to, given the entity feature. Link Freqeuncy: Each entity is scored according to the frequency that it is linked to by the candidate document (uses entity links). The following stats are parsed from v2.1 unprocessedAllButBenchmark: Categories: Unigram model of the enwiki categories of an entity. Entity Disambiguations: Unigram model of the disambiguation links of an entity Entity Inlinks: Unigram model of the disambiguation inlinks of an entity Entity Outlinks: Unigram model of the disambiguation outlinks of an entity Entity Redirects: Unigram model of the disambiguation redirect links of an entity Entity Sections: Unigram model of the section paths contained in the entity page. In addition, an "entity context" model is used. For each paragraph that an entity occurs (within the unprocessedAllButBenchmark v2.1), the surrounding unigrams, bigrams, and windowed bigrams are indexed. These -grams, along with the entity's name, are stored as pseudodocuments in an "entity context index" to be queried. Context Unigram: Query is tokenized into unigrams and used to query entity context database. Context Bigram: Query is tokenized into bigrams and used to query entity context database. Context Windowed Bigram: Query is tokenized into windowed bigram and used to query entity context database. TRAINING This model is trained on benchmarkY1-train, using both the document tree qrels and the entity tree qrels (because the document features can be transformed into entity features and vice versa, this allows us to use both training examples).
Run description: PARAGRAPH FEATURES Fielded BM25: Weighted combination of unigram, bigram, and windowed bigram likelihood from the query to the document using BM25 algorithm. TRAINING This model is trained on benchmarkY1-train, using the document tree qrels.
Run description: A feature combination model that includes baseline retrieval methods (QL, SDM, RM3), as well as entity-based query expansion features from PRF on AllButBench and from query entity linking.
Run description: A feature combination model that includes baseline retrieval methods (QL, SDM, RM3), as well as entity-based query expansion features from PRF on AllButBench and from query entity linking.
Run description: This is a full entity expansion run without a working set. It converts the learned expansion query weights into a galago weighted query and runs this against the full paragraph index.
Run description: This method uses an initial BM-25 search followed by a neural re-ranking system using word embeddings, a cosine similarity matrix, and IR features.