Skip to content

Runs - Round 1 2020

10x10.prf.unipd.it

Results | Participants | Input | Appendix

  • Run ID: 10x10.prf.unipd.it
  • Participant: unipd.it
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 871dd34389cad9cd9c3b751f30d9b2d0
  • Run description: base + 10 PRF docs, 10 PRF terms

10x20.prf.unipd.it

Results | Participants | Input | Appendix

  • Run ID: 10x20.prf.unipd.it
  • Participant: unipd.it
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 44f8b615120ca6df113253587c75c995
  • Run description: base + 10 PRF docs, 20 PRF terms

azimiv_wk1

Results | Participants | Input | Appendix

  • Run ID: azimiv_wk1
  • Participant: azimiv
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 546778ef263fde89aa9e3f2071ba3073
  • Run description: A corpus was generated using the 'abstract' from the metadata file, and preprocessed. For each topic, document matching scores were generated for the query, question, and narrative based on the corpus of abstracts, thus generating three separate match scores for each document for each topic. The three matching scores for each document were then normalized by dividing each match score by the maximum match score within that query, question, or narrative. The three normalized match score for each document were then summed, generating a single match score for each document for each topic. The documents were then ranked based on that match score for each topic.

base.unipd.it

Results | Participants | Input | Appendix

  • Run ID: base.unipd.it
  • Participant: unipd.it
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 031b0a8a37b2d925460d08852cd77db6
  • Run description: filtered, bool 'should' query, title + abstract, Elasticsearch scoring

baseline

Results | Participants | Input | Appendix

  • Run ID: baseline
  • Participant: VirginiaTechHAT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: b3c8c235fe3a0f9d137f1e4ae456de28
  • Run description: RM3 relevance feedback: 10 fbdocs, 50 fbterms, mufb=0, mudoc=1500.

BBGhelani1

Results | Participants | Input | Appendix

  • Run ID: BBGhelani1
  • Participant: BBGhelani
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 137efc24e1c1f76c68790134bc98384f
  • Run description: For each topic, we primed a continuous active learning model with documents found via a solr+bm25 search interface. Documents were then judged from an active learning judgment system. At most 10 minutes were spent on each topic. For this run, the ranklist was produced by (relevant docs -> non relevant docs -> model ranking)

BBGhelani2

Results | Participants | Input | Appendix

  • Run ID: BBGhelani2
  • Participant: BBGhelani
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 1652b468897e20b91aea251c58d4084e
  • Run description: For each topic, we primed a continuous active learning model with documents found via a solr+bm25 search interface. Documents were then judged from an active learning judgment system. At most 10 minutes were spent on each topic. For this run, the ranklist was produced by (relevant docs -> model ranking)

BERT

Results | Participants | Input | Appendix

  • Run ID: BERT
  • Participant: THUMSR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 2e82576a17e61074931d82e6037dd0a0
  • Run description: Combination of BM25 score and BERT score with Reinfoselect training.

BioinfoUA-emb

Results | Participants | Input | Appendix

  • Run ID: BioinfoUA-emb
  • Participant: BioinformaticsUA
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 80b05c54f15981aa74f49aef0bb65127
  • Run description: This run corresponds to the results of a system that was tunned for the BioASQ challenge (a more broad biomedical Adhoc retrieval challenge). So this submission tries to explore the possible similarity between the data domains in order to train a neural ranking model. The system uses a standard BM25 + Neural ranking model. In the retrieval were considered only documents that have title+abstract to be more similar to the BioASQ data. The neural ranking is built upon the DeepRank model and a more complete description can be found here [1]. The word embeddings were computed on CORD+Pubmed corpus using word2vec. For each topic, the field "question" was used to express the information need. REFs: [1] T. Almeida and S. Matos, 'Calling Attention to Passages for Biomedical Question Answering,' in Advances in Information Retrieval, 2020, pp. 69--77.

BioinfoUA-emb-q

Results | Participants | Input | Appendix

  • Run ID: BioinfoUA-emb-q
  • Participant: BioinformaticsUA
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: a5baba833498a4dc15c95b3091007a11
  • Run description: This run corresponds to the results of a system that was tunned for the BioASQ challenge (a more broad biomedical Adhoc retrieval challenge). So this submission tries to explore the possible similarity between the data domains in order to train a neural ranking model. The system uses a standard BM25 + Neural ranking model. In the retrieval were considered only documents that have title+abstract to be more similar to the BioASQ data. The neural ranking is built upon the DeepRank model and a more complete description can be found here [1]. The word embeddings were computed on CORD+Pubmed corpus using word2vec. For each topic, the field "query" was used to express the information need. REFs: [1] T. Almeida and S. Matos, 'Calling Attention to Passages for Biomedical Question Answering,' in Advances in Information Retrieval, 2020, pp. 69--77.

BioinfoUA-noadapt

Results | Participants | Input | Appendix

  • Run ID: BioinfoUA-noadapt
  • Participant: BioinformaticsUA
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: e95cb8dfa075ebb630a8ab69409228c2
  • Run description: This run corresponds to the results of a system that was tunned for the BioASQ challenge (a more broad biomedical Adhoc retrieval challenge). So this submission tries to explore the possible similarity between the data domains in order to train a neural ranking model. The system uses a standard BM25 + Neural ranking model. In the retrieval were considered only documents that have title+abstract to be more similar to the BioASQ data. The neural ranking is built upon the DeepRank model and a more complete description can be found here [1]. For each topic, the field "question" was used to express the information need. REFs: [1] T. Almeida and S. Matos, 'Calling Attention to Passages for Biomedical Question Answering,' in Advances in Information Retrieval, 2020, pp. 69--77.

BITEM_BL

Results | Participants | Input | Appendix

  • Run ID: BITEM_BL
  • Participant: BITEM
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: e6c33c13f6360539d1296817244d2a09
  • Run description: Baseline run Documents with body text indexed in Elasticsearch. Queries made with , , and a subset of words, filtered with their DF computed in PMC.

BITEM_df

Results | Participants | Input | Appendix

  • Run ID: BITEM_df
  • Participant: BITEM
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 92576af90fc88417cb9e6f456d0fba99
  • Run description: Baseline run + stemming + boosting of query terms based on DF (computed with PMC)

BITEM_stem

Results | Participants | Input | Appendix

  • Run ID: BITEM_stem
  • Participant: BITEM
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: adb08fd28081f786235798f5757e40f8
  • Run description: Baseline run + stemming

bm25_baseline

Results | Participants | Input | Appendix

  • Run ID: bm25_baseline
  • Participant: abccaba
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 94cc38c1873e7b89cc35c0832ff0124f
  • Run description: concatenate all the text among each document to build the bm25 model all characters to lowercase. use the shorter query. fix a bug in the last submission.

bm25_basline

Results | Participants | Input | Appendix

  • Run ID: bm25_basline
  • Participant: abccaba
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: c73312c31260129ab27a5aa6dd4d9992
  • Run description: concatenate all the text among each document to build the bm25 model all characters to lowercase

BM25R2

Results | Participants | Input | Appendix

  • Run ID: BM25R2
  • Participant: covidex
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: a048fb69f82033fa41679cb52747f0a3
  • Run description: run2: BM25 retrieval with Anserini. Index is formed as title+abstract+paragraph. Anserini's CovidQueryGenerator was used to build queries.

bm25t5

Results | Participants | Input | Appendix

  • Run ID: bm25t5
  • Participant: cord19.vespa.ai
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 703f06d58e58e2a065f3cc0118684115
  • Run description: Results from https://cord19.vespa.ai/. Queries produced by concatenation of topic query, question and narrative using type=any/Logical OR. Ranking using Vespa's bm25 implementation over abstract, title, body and T5 summary of abstract text.

BRPHJ_NLP1

Results | Participants | Input | Appendix

  • Run ID: BRPHJ_NLP1
  • Participant: BRPHJ_NLP
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 213e8837fb035c69c5f6b546b93e0b0c
  • Run description: We developed an information retrieval system that makes of Google Universal Sentence Encoder to embed all sentences across all documents. These sentence embeddings are then indexed into FAISS. The user query is also converted into its embedding using FAISS and searched in the FAISS index. After the sentences are returned by FAISS, we re-rank the corresponding documents using BM25 approach.

BRPHJ_NLP2

Results | Participants | Input | Appendix

  • Run ID: BRPHJ_NLP2
  • Participant: BRPHJ_NLP
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 0c30d48a4004cd5adb817c22f0d481a9
  • Run description: We developed an information retrieval system that makes of Google Universal Sentence Encoder to embed all sentences across all documents. These sentence embeddings are then indexed into FAISS. The user query is also converted into its embedding using FAISS and searched in the FAISS index.

BRPHJ_NLP3

Results | Participants | Input | Appendix

  • Run ID: BRPHJ_NLP3
  • Participant: BRPHJ_NLP
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 787e6b8f91aae6a81c25788f28198cc9
  • Run description: We used a closed domain question answering approach that has 2 stages - Retriever and Reader. Retriever fetches most relevant docs based on the question, while the Reader (a BERT QA model) provides most relevant answers for the question, along with the docs where the answer was found.

CBOWexp.0

Results | Participants | Input | Appendix

  • Run ID: CBOWexp.0
  • Participant: UB_BW
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: b453381b31694714fdee4538f60b63a6
  • Run description: For this run, we used Terrier-v5.2, an open source Information Retrieval (IR) platform. All the documents (title and abstracts) used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Stopword removal was enabled and we used Terrier-v5.2 stopword list. We used PL2 Divergence from Randomness term weighting model in Terrier-v5.2 IR platform to score and rank the documents. The hyper-parameter for PL2 was set to its default value of b = 1.0. We used our document collection to train a word2vec model which we used to expand our query before retrieval. We used the continuous bag of words (CBOW) as our training algorithm with the dimensions of the embeddings set to 100. The window and the minimum count of words were set to 5 and the number of workers set to 3. Using our trained model, we selected the 10 most similar words to our query for expansion and then performed retrieval on the indexed collection.

CincyMedIR-run1

Results | Participants | Input | Appendix

  • Run ID: CincyMedIR-run1
  • Participant: CincyMedIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: f5b1d88c3f036a01e6957105dc7c0128
  • Run description: Plain text query, searched against title and abstract on ElasticSearch with BM25

CincyMedIR-run2

Results | Participants | Input | Appendix

  • Run ID: CincyMedIR-run2
  • Participant: CincyMedIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d471ffd376a21e177a7c9741a80a481e
  • Run description: Query parsed by MetamapLite, searched against concepts in meta_terms on ElasticSearch with BM25

CincyMedIR-run3

Results | Participants | Input | Appendix

  • Run ID: CincyMedIR-run3
  • Participant: CincyMedIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d14ceb5d42096658ded120a5433218d2
  • Run description: All text in query, question, and narrative, parsed by MetamapList, searched against concepts in meta_terms on ElasticSearch with BM25

Conv_KNRM

Results | Participants | Input | Appendix

  • Run ID: Conv_KNRM
  • Participant: THUMSR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: e8476aa0a38ac5541156ba36ce0db21a
  • Run description: Combination of BM25 score and Conv-KNRM score with Reinfoselect training.

crowd1

Results | Participants | Input | Appendix

  • Run ID: crowd1
  • Participant: VirginiaTechHAT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: ac82e7bf8673965b83cbaa54d43a7a8d
  • Run description: Relevance feedback using related queries from Amazon MT workers.

crowd2

Results | Participants | Input | Appendix

  • Run ID: crowd2
  • Participant: VirginiaTechHAT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: ba19268919ceeebc8ed23a710509e68e
  • Run description: Relevance feedback using related queries and brief answers to the question from Amazon MT workers.

CSIROmed_PE

Results | Participants | Input | Appendix

  • Run ID: CSIROmed_PE
  • Participant: CSIROmed
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 231276205a8a365839aa79c30bff1685
  • Run description: Initial ranking is obtained with DFR model (edismax search over all indexed fields using 'question' value of the topic). Initial ranking step is followed by re-ranking of top 50 results per topic. For re-ranking we implemented an unsupervised approach for matching document sentences to query (question), based on max-pooled word embeddings and fuzzy Jaccard index (https://arxiv.org/abs/1904.13264). Sentence level scores were 4-max pooled, summed and interpolated with the original DFR scores for each document. Additionally, a boolean filter has been used to match documents published after November 2019.

CSIROmed_RF

Results | Participants | Input | Appendix

  • Run ID: CSIROmed_RF
  • Participant: CSIROmed
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: d58da66819fd6dbcafa5a00f63ba7f76
  • Run description: This is an interactive run based on a relevance model (RM3) used for automatic query expansion. Relevance model is built upon a small sample of manually produced relevance judgements (https://docs.google.com/spreadsheets/d/19EZSBw2j7FvN5UF-eUGB678_DybCTUVc1AbOhCCtQGw/edit?usp=sharing), using document titles and abstracts. Expanded query uses all topic fields and the corresponding relevance model (with interpolated boosts). Documents are searched over an aggregation of all indexed fields, with DFR used as retrieval model. Additionally, a boolean filter has been used to match documents published after 1st November 2019.

CSIROmedNIR

Results | Participants | Input | Appendix

  • Run ID: CSIROmedNIR
  • Participant: CSIROmed
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 6a85af5026332f73a846088a15ee6e7f
  • Run description: A neural index was built on the title, abstract fields of the COVID corpus alongside a traditional inverted index built on title, abstract and body text of the document. The neural index was built from the pooled classification token (1st token of the final BERT layer) using the covidbert-nli model (https://huggingface.co/gsarti/covidbert-nli) from the title, based off the sentence transformer (Reimers et al. Sentence-BERT, 2019). For the abstract, we took the Bag-of-Sentence approach where we averaged the individual sentence embeddings (sentence were segmented using segtok). All embeddings had a final dimension size of [1, 768]. We searched on the neural index using the query, narrative and question fields of the topics using the same embedding approach as with the document title embedding over the title and abstract neural index fields giving a total of 6 cosine similarity computations. We combine BM25 scores from traditional search over a combination of query, narrative and question fields over all document facets (body, title, abstract), giving a total of 9 different query-facet combinations. We take the natural logarithm of the total BM25 score (to match the range of the cosine scores) which is then added the cosine scores: final_score = log(sum of BM25 query-facet combs) + cosineScores Additionally, we filter the document by date. Documents created before December 31st 2019 (before the first reported case) had their scores automatically set to zero.

cu_dbmi_bm25_1

Results | Participants | Input | Appendix

  • Run ID: cu_dbmi_bm25_1
  • Participant: columbia_university_dbmi
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: 931bed72a82ec44a21b86a2ba53b3ecb
  • Run description: Define COVID-19 key words To find COVID-19 related articles, we have defined a list of key words, the article is considered COVID-19 related if, any of these fields (title, abstract and full text) has any of the key word mentions. To make sure we have included all the key words for COVID-19, we trained a word2vec model on all full texts for phrase embeddings, then we tried to find all synonyms for COVID-19 from the word2vec model. We used an iterative approach, where we start looking for synonyms of one key word and add new phrases or words to the key word list, then use the newly found key word to repeat the same process until there is no new key word found anymore. Here is the list of synonyms for COVID-19. ['ncov', 'covid19', 'covid-19', 'sars cov2', 'sars cov-2', 'sars-cov-2', 'sars coronavirus 2', '2019-ncov', '2019 novel coronavirus', '2019-ncov sars', 'cov-2', 'cov2', 'novel coronvirus', 'coronavirus 2019-ncov'] Retrieve relevant articles for COVID-19 (BM25) We use a python library called whoosh as the indexing engine to enable fast search in title, abstract, and full_text across all documents. The standard tokenizer and the stemmer analyzer are applied during indexing. We retrieve relevant articles using the BM25 algorithm. https://whoosh.readthedocs.io/en/latest/index.html. We construct the search query for each topic using query, question and narrative fields provided in the topic document as the following demonstrates, lower case words, remove puncuation marks and stop words from query, question and narrative there are two parts defined in the construction of the seach query -- main query and subquery main query is constructed using the query and question fields following the pattern ((query) OR (question)), the OR operator allows us to retrieve the maximum number of documents related to the main topic. The purpose of main query is to decide the "scope" of the search. subquery is constructed using narrative only, we run spaCy to extract the noun phrases and construct the subquery using an OR operator following the pattern (phrase_1 OR phrase_2 OR phrase_3 OR .... phrase_n), the purpose of subquery is to decide the priorities of the relevant documents. Obviously the more key words a document contains, the higher score it will receive. main_query and subquery are assembled together using the AND operator ((query) OR (question)) AND ((query) OR (question) phrase_1 OR phrase_2 OR phrase_3 OR .... phrase_n). Noted that a copy of main query is also added to the subquery because we don't want to lose any relevant documents that do not contain any of the phrases extracted from the narrative.

cu_dbmi_bm25_2

Results | Participants | Input | Appendix

  • Run ID: cu_dbmi_bm25_2
  • Participant: columbia_university_dbmi
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: f8dd0956837aa4e1aeb0da570c5b23de
  • Run description: Define COVID-19 key words To find COVID-19 related articles, we have defined a list of key words, the article is considered COVID-19 related if, any of these fields (title, abstract and full text) has any of the key word mentions. To make sure we have included all the key words for COVID-19, we trained a word2vec model on all full texts for phrase embeddings, then we tried to find all synonyms for COVID-19 from the word2vec model. We used an iterative approach, where we start looking for synonyms of one key word and add new phrases or words to the key word list, then use the newly found key word to repeat the same process until there is no new key word found anymore. Here is the list of synonyms for COVID-19. ['ncov', 'covid19', 'covid-19', 'sars cov2', 'sars cov-2', 'sars-cov-2', 'sars coronavirus 2', '2019-ncov', '2019 novel coronavirus', '2019-ncov sars', 'cov-2', 'cov2', 'novel coronvirus', 'coronavirus 2019-ncov'] Retrieve relevant articles for COVID-19 (BM25) We use a python library called whoosh as the indexing engine to enable fast search in title, abstract, and full_text across all documents. The standard tokenizer and the stemmer analyzer are applied during indexing. We retrieve relevant articles using the BM25 algorithm. https://whoosh.readthedocs.io/en/latest/index.html. We construct the search query for each topic using query, question and narrative fields provided in the topic document as the following demonstrates, lower case words, remove puncuation marks and stop words from query, question and narrative there are two parts defined in the construction of the seach query -- main query and subquery main query is constructed using the query and question fields following the pattern ((query) OR (question)), the OR operator allows us to retrieve the maximum number of documents related to the main topic. The purpose of the main query is to decide the "scope" of the search. the subquery is constructed using narrative only, we run spaCy to extract the noun phrases and construct the subquery using an OR operator following the pattern (phrase_1 OR phrase_2 OR phrase_3 OR .... phrase_n), the purpose of the subquery is to decide the priorities of the relevant documents. Obviously the more keywords a document contains, the higher score it will receive. while generating the subquery, the PMI (Pointwise Mutual Information) measure is applied to filter out common narrative keywords that contain less information related to the query. The PMI score is computed for each collocation pair of query and narrative keyword. Any keywords with PMI higher than the overall median will be kept in the subquery and others are removed. main_query and subquery are assembled together using the AND operator ((query) OR (question)) AND ((query) OR (question) phrase_1 OR phrase_2 OR phrase_3 OR .... phrase_n). Noted that a copy of the main query is also added to the subquery because we don't want to lose any relevant documents that do not contain any of the phrases extracted from the narrative.

DA_IICT_all

Results | Participants | Input | Appendix

  • Run ID: DA_IICT_all
  • Participant: DA_IICT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 5f48158b063bc580e7075b2d5bca8c0e
  • Run description: In this run, query, question and narration all three fields as considered as query with In_expC2 retrieval model.

DA_IICT_narr

Results | Participants | Input | Appendix

  • Run ID: DA_IICT_narr
  • Participant: DA_IICT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: a50db3feb7c7a41eb7968075ed61731e
  • Run description: This run uses In_expC2 retrieval model and narrative as query.

DA_IICT_narr_qe

Results | Participants | Input | Appendix

  • Run ID: DA_IICT_narr_qe
  • Participant: DA_IICT
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 5449e23b4ccf98aae09ce3aa0189e63b
  • Run description: This run used In_expC2 retrieval model with automatic query expansion on narrative with Bo1 model using top 10 retrieved documents.

dmis-rnd1-run1

Results | Participants | Input | Appendix

  • Run ID: dmis-rnd1-run1
  • Participant: KoreaUniversity_DMIS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 4cbaf4783a81da616b59927cb1d0f59d
  • Run description: We mainly used covidAsk (https://covidask.korea.ac.kr), a real-time QA system based on DenSPI [1], for the submission. While the initial purpose of the system was to give answers to natural questions in fine-grained phrases, covidAsk implicitly performs IR as documents that contain correct answer phrases can be regarded as relevant. For this submission, we used only subsets of CORD-19 documents that contain synonyms of 'COVID-19' in their titles or abstracts. This gave us approximately 3K documents from which we indexed about 800K phrase vectors. As our document representation of each phrase was too simple (BM25), we also combined document scores from Covidex [2]. We found the hyperparameters with our small validation set (100 QA pairs) and used 'question' in each topic with DenSPI trained on SQuAD (Dense-First Search). [1] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, Seo et al., 2019 [2] Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned, Zhang et al., 2020

dmis-rnd1-run2

Results | Participants | Input | Appendix

  • Run ID: dmis-rnd1-run2
  • Participant: KoreaUniversity_DMIS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 6a41ab9e886b2dce8ee83d67c86b05c7
  • Run description: We mainly used covidAsk (https://covidask.korea.ac.kr), a real-time QA system based on DenSPI [1] for the submission. While the initial purpose of the system was to give answers to natural questions in fine-grained phrases, covidAsk implicitly performs IR as documents that contain correct answer phrases can be regarded as relevant. For this submission, we used only subsets of CORD-19 documents that contain synonyms of 'COVID-19' in their titles or abstracts. This gave us about 3K documents from which we indexed about 800K phrase vectors. As our document representation of each phrase was too simple (BM25), we also combined document scores from Covidex [2]. We found the hyperparameters with our small validation set (100 QA pairs) and used 'query' in each topic with DenSPI trained on SQuAD+NaturalQuestions (Hybrid Search). [1] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, Seo et al., 2019 [2] Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned, Zhang et al., 2020

dmis-rnd1-run3

Results | Participants | Input | Appendix

  • Run ID: dmis-rnd1-run3
  • Participant: KoreaUniversity_DMIS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 7de60ebc89b85e1cc7cdc358bc45309b
  • Run description: We mainly used covidAsk (https://covidask.korea.ac.kr), a real-time QA system based on DenSPI [1] for the submission. While the initial purpose of the system was to give answers to natural questions in fine-grained phrases, covidAsk implicitly performs IR as documents that contain correct answer phrases can be regarded as relevant. For this submission, we used only subsets of CORD-19 documents that contain synonyms of 'COVID-19' in their titles or abstracts. This gave us about 3K documents from which we indexed about 800K phrase vectors. As our document representation of each phrase was too simple (BM25), we also combined document scores from Covidex [2]. We found the hyperparameters with our small validation set (100 QA pairs) and chose between 'question' and 'query' in each topic manually as an input to DenSPI trained on SQuAD (Dense-First Search) or SQuAD+NaturalQuestions (Hybrid Search), respectively. Also we manually modified typos and ambiguity in queries. [1] Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, Seo et al., 2019 [2] Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned, Zhang et al., 2020

elhuyar_indri

Results | Participants | Input | Appendix

  • Run ID: elhuyar_indri
  • Participant: Elhuyar_NLP_team
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 1b8322d059a152cc61a2fa0f4cb83277
  • Run description: We tackle this document retrieval task as a passage retrieval task. Our system returns docids of their best scored passages. For ranking of relevant passages of the collection corresponding to the queries, we use a language modeling based information retrieval approach (Ponte & Croft, 1998). For that purpose, we used the Indri search engine (Strohman, 2005), which combines Bayesian networks with language models.

elhuyar_rRnk_cbert

Results | Participants | Input | Appendix

  • Run ID: elhuyar_rRnk_cbert
  • Participant: Elhuyar_NLP_team
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 7111a37cad8a12c8ba3680c4d641a7b7
  • Run description: We tackle this document retrieval task as a passage retrieval task performed in two steps: a first ranking and b) re-ranking. Our system returns docids of their best scored passages. In order to obtain the first ranking of relevant passages of the collection corresponding to the queries, we use a language modeling based information retrieval approach (Ponte & Croft, 1998). For that purpose, we used the Indri search engine (Strohman, 2005), which combines Bayesian networks with language models. Then, we make a re-ranking based on BERT following a strategy similar to the one proposed by Nogueira and Cho (2019). As we do not have a collection of query pairs and relevant paragraphs for tuning BERT for this passage retrieval task, we simulate a training collection composed of titles and their corresponding abstracts from the COVID-19 Open Research dataset. Through this training collection we tuned the Clinical BERT model (Alsentzer et al., 2019) to the task of identifying relevant queries and paragraphs.

elhuyar_rRnk_sbert

Results | Participants | Input | Appendix

  • Run ID: elhuyar_rRnk_sbert
  • Participant: Elhuyar_NLP_team
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 8615aa228992997bc281f3a91552c760
  • Run description: We tackle this document retrieval task as a passage retrieval task performed in two steps: a first ranking and b) re-ranking. Our system returns docids of their best scored passages. In order to obtain the first ranking of relevant passages of the collection corresponding to the queries, we use a language modeling based information retrieval approach (Ponte & Croft, 1998). For that purpose, we used the Indri search engine (Strohman, 2005), which combines Bayesian networks with language models. Then, we make a re-ranking by combining Indri scores and cosine similarities between query and first ranking's passages modeled by SBERT (Reimers and Gurevych, 2020). Specifically, we use bert-large-nli-mean-tokens model trained on SNLI and MultiNLI dataset. This model provides a performance of 79.19 on STSbenchmark.

ERST_NARRATIVE

Results | Participants | Input | Appendix

  • Run ID: ERST_NARRATIVE
  • Participant: KAROTENE_SYNAPTIQ_UMBC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: d2adb76bdd32f98fb2d91fd3a0b2f5e0
  • Run description: unsupervised document tagging using document text heuristics, indexing the documents in elastic search using the mined tags, using a concatenation of the 'narrative' field of the topic as the retrieval query.

ERST_PROSE

Results | Participants | Input | Appendix

  • Run ID: ERST_PROSE
  • Participant: KAROTENE_SYNAPTIQ_UMBC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 939195bd30d5eb0378ff1f5d5f57f939
  • Run description: unsupervised document tagging using document text heuristics, indexing the documents in elastic search using the mined tags, using a concatenation of the 'query', 'question' and 'narrative' as the retrieval query.

ERST_QUESTION

Results | Participants | Input | Appendix

  • Run ID: ERST_QUESTION
  • Participant: KAROTENE_SYNAPTIQ_UMBC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 463474b1b030e0961ed3703456acd969
  • Run description: unsupervised document tagging using document text heuristics, indexing the documents in elastic search using the mined tags, using a concatenation of the 'question' field of the topic as the retrieval query.

factum-1

Results | Participants | Input | Appendix

  • Run ID: factum-1
  • Participant: Factum
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 09c50f18a0bb831c071aa47027e4978e
  • Run description: End-to-end retrieval with siamese BERT encoder trained on NLI data (SNLI, Multi-NLI and a new dataset with artificial inference examples). We used the narratives as queries.

ielab-prf

Results | Participants | Input | Appendix

  • Run ID: ielab-prf
  • Participant: ielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: fb3c2d4b7a6367c0a156d41b466a4337
  • Run description: Cascade retrieval which uses an initial retrieval step with BM25 over full index, followed by a PRF re-ranking. Documents that only have titles but not abstract or full text are removed.

ielab-prf.2query.v3

Results | Participants | Input | Appendix

  • Run ID: ielab-prf.2query.v3
  • Participant: ielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 4ae0c08eb3d7d6b60d4aac27b9e98485
  • Run description: Cascade retrieval which uses an initial retrieval step with BM25 over full index, followed by a PRF re-ranking, where the query is extracted from the query field of the topic. Then results are re-ranked using a query constructed by concatenating the query field, the narrative field and the question field. Documents that only have titles but not abstract or full text are removed.

ielab-prf.recency

Results | Participants | Input | Appendix

  • Run ID: ielab-prf.recency
  • Participant: ielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: c9430d4fe111b9d110183419526ba4bc
  • Run description: Cascade retrieval which uses an initial retrieval step with BM25 over full index, followed by a PRF re-ranking, followed by a fusion re-ranking that accounts for both relevance and recency. Documents that only have titles but not abstract or full text are removed.

ir_covid19_cle_dfr

Results | Participants | Input | Appendix

  • Run ID: ir_covid19_cle_dfr
  • Participant: IR_COVID19_CLE
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 1fbb02c24889b1ce7092565929b7393e
  • Run description: We have used the data set with all the documents from corpus Commercial use subset, Non-commercial use subset, Custom license subset and bioRxiv/medRxiv subsets. We used "Paper_id", "Title Id" and "Abstract" to index all the documents using Apache Lucene. We have indexed every document for all tokens present with in the document. However, in a collection of documents these tokens can be repeating in multiple documents as well. Here, we use inverted index to store tokens repeating in multiple indexes, so that when searched for a specific token, we can narrow down the search documents specifically all documents that token is present. We have used the query of the topic for querying the index. We parsed the query with English Analyzer and searched on the abstract text field of the index. For each query, We have retrieved the Top 100 documents and the relevance scores using Divergence from Randomness (DFR) similarity model, which is based on randomness model, first normalization and term frequency normalization. Reference Paper: RIJSBERGEN, C., & AMATI, G. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002).

ir_covid19_cle_ib

Results | Participants | Input | Appendix

  • Run ID: ir_covid19_cle_ib
  • Participant: IR_COVID19_CLE
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d4f9f8e649a00ad9c205d820df625d0c
  • Run description: We have used the data set with all the documents from corpus Commercial use subset, Non-commercial use subset, Custom license subset and bioRxiv/medRxiv subsets. We used "Paper_id", "Title Id" and "Abstract" to index all the documents using Apache Lucene. We have indexed every document for all tokens present with in the document. However, in a collection of documents these tokens can be repeating in multiple documents as well. Here, we use inverted index to store tokens repeating in multiple indexes, so that when searched for a specific token, we can narrow down the search documents specifically all documents that token is present. We have used the query of the topic for querying the index. We parsed the query with English Analyzer and searched on the abstract text field of the index. For each query, We have retrieved the Top 100 documents and the relevance scores using Information based (IB Similarity) which models rely on normalized values of occurrence of a word in documents. Information Models are characterized by three elements that are normalization function, probability distribution and retrieval function. Reference Paper: Clinchant , S., & Gaussier, E. (2010). Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10).

ir_covid19_cle_lmd

Results | Participants | Input | Appendix

  • Run ID: ir_covid19_cle_lmd
  • Participant: IR_COVID19_CLE
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d2e941cff6790c29af651a8898c1e9f1
  • Run description: We have used the data set with all the documents from corpus Commercial use subset, Non-commercial use subset, Custom license subset and bioRxiv/medRxiv subsets. We used "Paper_id", "Title Id" and "Abstract" to index all the documents using Apache Lucene. We have indexed every document for all tokens present with in the document. However, in a collection of documents these tokens can be repeating in multiple documents as well. Here, we use inverted index to store tokens repeating in multiple indexes, so that when searched for a specific token, we can narrow down the search documents specifically all documents that token is present. We have used the query of the topic for querying the index. We parsed the query with English Analyzer and searched on the abstract text field of the index. For each query, We have retrieved the Top 100 documents and the relevance scores using LM Dirichlet similarity which are language model-based similarities that rely on relative frequency of occurrence of words in documents. In LMD model is the best out of box model for short queries. For our approach we have used a default value for 𝜇 = 2000 Reference Paper: Zhai, C., & Lafferty, J. (2001). A Study of Smoothing Methods for Language Models. Proceedings of the 24th annual international ACM SIGIR conference onResearch and development in information retrieval (SIGIR '01). ACM, .

irc_entrez

Results | Participants | Input | Appendix

  • Run ID: irc_entrez
  • Participant: IRC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 5325f557f420633811a9684a6e61108a
  • Run description: As part of TREC-COVID, we submit automatic runs based on (pseudo) relevance feedback in combination with a reranking approach. The reranker is trained on relevance feedback data that is retrieved from PubMed/PubMed Central (PMC). The training data is retrieved with queries using the contents of the <query> tags only. For each topic a new reranker is trained. We consider those documents retrieved by the specific topic query as relevant training data, and the documents of the other 29 topics as non-relevant training data. Given a baseline run, the trained system reranks documents. The baseline run is retrieved with the default ranker of Elasticsearch/Lucene (BM25) and queries using the contents of the <query> tags only. For our reranker we use GloVe embeddings in combination with the Deep Relevance Matching Model (DRMM). Our three run submissions differ by the way training data is retrieved from PubMed/PMC. irc_entrez: This run is trained on titles and abstracts retrieved from the Entrez Programming Utilities API with "type=relevance".

irc_pmc

Results | Participants | Input | Appendix

  • Run ID: irc_pmc
  • Participant: IRC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 2b87692a4e586e3e9d76ad6cea0cff08
  • Run description: As part of TREC-COVID, we submit automatic runs based on (pseudo) relevance feedback in combination with a reranking approach. The reranker is trained on relevance feedback data that is retrieved from PubMed/PubMed Central (PMC). The training data is retrieved with queries using the contents of the <query> tags only. For each topic a new reranker is trained. We consider those documents retrieved by the specific topic query as relevant training data, and the documents of the other 29 topics as non-relevant training data. Given a baseline run, the trained system reranks documents. The baseline run is retrieved with the default ranker of Elasticsearch/Lucene (BM25) and queries using the contents of the <query> tags only. For our reranker we use GloVe embeddings in combination with the Deep Relevance Matching Model (DRMM). Our three run submissions differ by the way training data is retrieved from PubMed/PMC. irc_pmc: This run is trained on full text documents retrieved from PMC.

irc_pubmed

Results | Participants | Input | Appendix

  • Run ID: irc_pubmed
  • Participant: IRC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 15874a1065f3c64c1990da103ab5ae27
  • Run description: As part of TREC-COVID, we submit automatic runs based on (pseudo) relevance feedback in combination with a reranking approach. The reranker is trained on relevance feedback data that is retrieved from PubMed/PubMed Central (PMC). The training data is retrieved with queries using the contents of the <query> tags only. For each topic a new reranker is trained. We consider those documents retrieved by the specific topic query as relevant training data, and the documents of the other 29 topics as non-relevant training data. Given a baseline run, the trained system reranks documents. The baseline run is retrieved with the default ranker of Elasticsearch/Lucene (BM25) and queries using the contents of the <query> tags only. For our reranker we use GloVe embeddings in combination with the Deep Relevance Matching Model (DRMM). Our three run submissions differ by the way training data is retrieved from PubMed/PMC. irc_pubmed: This run is trained on titles and abstracts retrieved from PubMed's search interface with "best match". We scrape the PMIDs and retrieve the titles and abstracts afterwards.

IRIT_marked_base

Results | Participants | Input | Appendix

  • Run ID: IRIT_marked_base
  • Participant: IRIT_markers
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 92b931a05982c293989f81c418fcd4a9
  • Run description: We use a BERT-base (12 layers, 768 hidden size) fine-tuned on Ms Marco passage set. We use a full ranking strategy with two stages: in the first stage, we use Anserini Bm25+ RM3 to retrieve top-1000 candidate documents for each topic using an index on the title+abstract of the CORD-19 documents, then we use the fine-tuned BERT to re-rank this list.

IRIT_marked_mu_pair

Results | Participants | Input | Appendix

  • Run ID: IRIT_marked_mu_pair
  • Participant: IRIT_markers
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 936f88b66b6525b2e154cc4a5796e88c
  • Run description: We use a BERT-base (12 layers, 768 hidden size) fine-tuned on Ms Marco passage set with a marking strategy that puts focus on exact match signals between query and document terms. We use a full ranking strategy with two stages: in the first stage, we use Anserini Bm25+ RM3 to retrieve top-1000 candidate documents for each topic using an index on the title+abstract of the CORD-19 documents, then we use the fine-tuned BERT to re-rank this list.

IRIT_marked_un_pair

Results | Participants | Input | Appendix

  • Run ID: IRIT_marked_un_pair
  • Participant: IRIT_markers
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 3b0430040584d41ba9922d3b2887d406
  • Run description: We use a BERT-base (12 layers, 768 hidden size) fine-tuned on Ms Marco passage set with a marking strategy that puts focus on exact match signals between query and document terms. We use a full ranking strategy with two stages: in the first stage, we use Anserini Bm25+ RM3 to retrieve top-1000 candidate documents for each topic using an index on the title+abstract of the CORD-19 documents, then we use the fine-tuned BERT to re-rank this list.

ixa-ir-filter-narr

Results | Participants | Input | Appendix

  • Run ID: ixa-ir-filter-narr
  • Participant: ixa
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 8acecee90c754bb67690f62860542796
  • Run description: - Whoosh library of Python to implement the indexing and searching engines. - Only papers related to COVID-19 (not other coronaviruses like SARS-CoV and MERS) are indexed. For that purpose, we created a list of synonyms of COVID-19, and we check if a synonym appears in the title or the abstract of a paper. - Abstracts and full text provided in PMC or PDF JSON format are indexed. The indexing unit will be an abstract or each of the paragraphs of the full text (as marked in JSON files). - BM25F scoring algorithm. - Content of "narrative" field of the topics is used to query. - All automatic

ixa-ir-filter-query

Results | Participants | Input | Appendix

  • Run ID: ixa-ir-filter-query
  • Participant: ixa
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 83338b60068dd232221ea5dde7bbef7b
  • Run description: - Whoosh library of Python to implement the indexing and searching engines. - Only papers related to COVID-19 (not other coronaviruses like SARS-CoV and MERS) are indexed. For that purpose, we created a list of synonyms of COVID-19, and we check if a synonym appears in the title or the abstract of a paper. - Abstracts and full text provided in PMC or PDF JSON format are indexed. The indexing unit will be an abstract or each of the paragraphs of the full text (as marked in JSON files). - BM25F scoring algorithm. - Content fo "query" field of the topics is used to query. - All automatic

ixa-ir-filter-quest

Results | Participants | Input | Appendix

  • Run ID: ixa-ir-filter-quest
  • Participant: ixa
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: f1a2a9a4b363d89d8fe9f34eea88b78b
  • Run description: - Whoosh library of Python to implement the indexing and searching engines. - Only papers related to COVID-19 (not other coronaviruses like SARS-CoV and MERS) are indexed. For that purpose, we created a list of synonyms of COVID-19, and we check if a synonym appears in the title or the abstract of a paper. - Abstracts and full text provided in PMC or PDF JSON format are indexed. The indexing unit will be an abstract or each of the paragraphs of the full text (as marked in JSON files). - BM25F scoring algorithm. - Content of "question" field of the topics is used to query. - All automatic

jlbase

Results | Participants | Input | Appendix

  • Run ID: jlbase
  • Participant: julielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: e34e6f9d685b1d9488fc034a48769ef4
  • Run description: Indexing of text contents into ElasticSearch 7.0.1 with default settings. Searching: Mandatory disjunctive clause for topic "query" field. Mandatory dismax clause for "COVID19, COVID-19, SARS-CoV-2, SARS-CoV2, 2019-nCoV"

jlprec

Results | Participants | Input | Appendix

  • Run ID: jlprec
  • Participant: julielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: c7bb91155e1a7ed200f63031bed0ed26
  • Run description: Indexing of text contents into ElasticSearch 7.0.1 with default settings. Searching: In most queries the token "coronavirus" is present. However, coronaviridae are a family of viruses, that are not limited to only SARS-CoV-2. Thus many false positive are likely to be found. This holds also true for other terms, such as "animal model". This term does not occur often in literature, as most of the researchers specify the animal they used as model organism (such as mice or rats). So we created a list of synonyms to specify these general terms (we use "mouse" as model indicator). We hypothesize that nouns are the the terms that contained the most information. Thus we used a part-of-speech tagger to isolate nouns and used a blacklist to filter them (created manually by query inspection). To mine important terms we used the narrative topic field. The nouns of the query and question were part of the narrative in most cases.

jlrecall

Results | Participants | Input | Appendix

  • Run ID: jlrecall
  • Participant: julielab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: b50a73152e81d70364aaa5ef856e4439
  • Run description: Indexing of text contents into ElasticSearch 7.0.1 with default settings.. Searching: We downloaded the UMLS (Version 2019AB). We used the Jena UMLS Filter Tool (JuFiT, https://github.com/JULIELab/jufit) to filter the UMLS for English terms belonging to a manually composed list of nearly 50 Semantic Type Groups. The semantic types were selected to compile a dictionary of anatomy, drugs, diseases, symptoms, disorders, and common behavior about people. The goal was to augment the topic information with terms that could help in the search vor COVID-19 literature. This is the employed UMLS term group: T052 ACTI Activity T053 ACTI Behavior T055 ACTI Individual_Behavior T056 ACTI Daily_or_Recreational_Activity T017 ANAT Anatomical_Structure T021 ANAT Fully_Formed_Anatomical_Structure T022 ANAT Body_System T023 ANAT Body_Part,_Organ,_or_Organ_Component T024 ANAT Tissue T029 ANAT Body_Location_or_Region T030 ANAT Body_Space_or_Junction T031 ANAT Body_Substance T121 CHEM Pharmacologic_Substance T129 CHEM Immunologic_Factor T130 CHEM Indicator,_Reagent,_or_Diagnostic_Aid T195 CHEM Antibiotic T200 CHEM Clinical_Drug T079 CONC Temporal_Concept T081 CONC Quantitative_Concept T170 CONC Intellectual_Product T074 DEVI Medical_Device T075 DEVI Research_Device T203 DEVI Drug_Delivery_Device T020 DISO Acquired_Abnormality T033 DISO Finding T047 DISO Disease_or_Syndrome T184 DISO Sign_or_Symptom T190 DISO Anatomical_Abnormality T083 GEOG Geographic_Area T005 LIVB Virus T016 LIVB Human T098 LIVB Population_Group T099 LIVB Family_Group T100 LIVB Age_Group T101 LIVB Patient_or_Disabled_Group T073 OBJC Manufactured_Object T093 ORGA Health_Care_Related_Organization T034 PHEN Laboratory_or_Test_Result T038 PHEN Biologic_Function T032 PHYS Organism_Attribute T039 PHYS Physiologic_Function T040 PHYS Organism_Function T042 PHYS Organ_or_Tissue_Function T201 PHYS Clinical_Attribute T058 PROC Health_Care_Activity T059 PROC Laboratory_Procedure T060 PROC Diagnostic_Procedure T061 PROC Therapeutic_or_Preventive_Procedure T062 PROC Research_Activity T063 PROC Molecular_Biology_Research_Technique

KU_run1

Results | Participants | Input | Appendix

  • Run ID: KU_run1
  • Participant: IRLabKU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 993cb699ee8213bb14a0e2c06b1f8ff8
  • Run description: This is a baseline run using BM25 implementation from the Indri Search engine with k1=1.2 and b=0.75. After retrieving the initial ranking, we re-rank the top-50 documents prioritizing articles published in 2020.

KU_run2

Results | Participants | Input | Appendix

  • Run ID: KU_run2
  • Participant: IRLabKU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: eaad6d525d06f3223fb9a37b394e237b
  • Run description: This run uses query expansion using external data sources with different weights for different types of expanded terms. The external data sources we employ include Human Disease Ontology and Lexigram API. The HDO has been developed as a standardized human disease ontology, playing the role of providing consistent, reusable, and sustainable descriptions of human disease terms. As HDO is organized hierarchically, we retrieve the parent-class and the sub-class disease concepts. From Lexigram, we extract the entities from the original queries and types of entities. As a result, a query is expanded with potentially alternative disease terms, entity labels, and entity types. Specifically, we assign original query terms with weight 1.0, alternative disease terms 0.7, entity labels 0.4, and entity types 0.1. In practice, we use the Indri Language Model with Dirichlet smoothing with mu=2500.

KU_run3

Results | Participants | Input | Appendix

  • Run ID: KU_run3
  • Participant: IRLabKU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 8ef345bef28eec1f4109a1e1510fc067
  • Run description: First, we built a paragraph-level index where we create a document for indexing comprising the title, abstract, and that paragraph. We use BM25 implementation from the Indri Search engine with k1=1.2 and b=0.75 to generate an initial ranking. After retrieving the initial ranking, we re-rank the top-50 documents prioritizing articles published in 2020. Next, using the pre-trained BioBERT from the HuggingFace Transformer library we generate representations of each paragraph and queries, compute the similarity between both, and finally re-sort the documents according to each similarity score.

lda400s2000

Results | Participants | Input | Appendix

  • Run ID: lda400s2000
  • Participant: TM_IR_HITZ
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 0d01b54b65ec112e3a735e93c5088f15
  • Run description: The submitted system has two components. The first component is a LDA based recommender system, which helps identifying paper topically related to the narrative and question, and automatically created subset to be indexed. LDA models 400 topics on abstracts. The second component is an information retrieval system, based on the classical BM25F search algorithm. This system indexes not only the abstracts, but also paragraphs on the full text of the papers.

lda400s5000

Results | Participants | Input | Appendix

  • Run ID: lda400s5000
  • Participant: TM_IR_HITZ
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 9bb393ca1cc014eecad524622df9e78c
  • Run description: The submitted system has two components. The first component is a LDA based recommender system, which helps identifying paper topically related to the narrative and question, and automatically created subset to be indexed. LDA models 400 topics on abstracts. The second component is an information retrieval system, based on the classical BM25F search algorithm. This system indexes not only the abstracts, but also paragraphs on the full text of the papers.

Meta-Conv-KNRM

Results | Participants | Input | Appendix

  • Run ID: Meta-Conv-KNRM
  • Participant: THUMSR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: bb3351d723a1ae69f343e08a5eda3481
  • Run description: Conv-KNRM with Meta learning.

NTU_NMLAB_BM25_ALLQQ

Results | Participants | Input | Appendix

  • Run ID: NTU_NMLAB_BM25_ALLQQ
  • Participant: NTU_NMLab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: ca284fb0e2c43b310c098df7cd06a069
  • Run description: Used Okapi BM25 model, with parameter k=1.5 and b=0.75. We captured all words of title, abstract, and whole body text session. For search query, we used query part and question part to search.

NTU_NMLAB_BM25_Hum2

Results | Participants | Input | Appendix

  • Run ID: NTU_NMLAB_BM25_Hum2
  • Participant: NTU_NMLab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 147b6e753862fe928eb874a52f4842a1
  • Run description: Used Okapi BM25 model, with parameter k=1.5 and b=0.75. We captured all words of title, abstract, and whole body text session. For search query, we used human-crafted terms to search through the system.

NTU_NMLAB_BM25_Human

Results | Participants | Input | Appendix

  • Run ID: NTU_NMLAB_BM25_Human
  • Participant: NTU_NMLab
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 312b34c57c3a3bebb8de70baae28f253
  • Run description: Used Okapi BM25 model, with parameter k=1.5 and b=0.75. We captured all words of title, abstract, and whole body text session. For search query, we used human-crafted terms to search through the system.

OHSU_RUN1

Results | Participants | Input | Appendix

  • Run ID: OHSU_RUN1
  • Participant: OHSU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 77a8a6605717eabfe87c7cf03071d838
  • Run description: Pyserini, a python port of Anserini, was configured to perform a search on a Lucene index of full-text articles from the CORD-19 dataset. BM25 and RM3 rerankers were hypertuned. The unedited query was used as the search string for the Pyserini SimpleSearcher class, and documents were ranked using the default Anserini scoring system.

OHSU_RUN2

Results | Participants | Input | Appendix

  • Run ID: OHSU_RUN2
  • Participant: OHSU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: 2ac34e3a2bbb52b379a5a74286d25e2e
  • Run description: Pyserini, a python port of Anserini, was used to search a pre-built Lucene index of full-text articles from CORD-19. The query, question, and narrative for each topic were combined, tokenized, and filtered to remove stopwords. Stopwords were manually generated to remove redundant words and unhelpful query terms. Keywords specific to COVID-19 were also manually added to each combined query to narrow search. BM25 and RM3 reranking parameters were hypertuned manually prior to search. The combined, preprocessed string was inputted into the Pyserini SimpleSearcher class to generate 1000 ranked documents per topic. Scoring was performed according to Anserini's default scoring function.

OHSU_RUN3

Results | Participants | Input | Appendix

  • Run ID: OHSU_RUN3
  • Participant: OHSU
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: c3881bc7886690cea98b20e23c8a74bf
  • Run description: Pyserini, a python port of Anserini, was configured to search a Lucene index of full-text CORD19 articles. The query and question were combined, tokenized with manual stopword removal prior to submission into a Bio Entrez to generate a MeSH search. This MeSH search contained a small set of synonyms relevant to our original search terms and was further tokenized to remove stopwords. This final, preprocessed MeSH search was used as an input string into the Pyserini SimpleSearcher class, which was tuned with BM25 and RM3 reranking. The SimpleSearcher retrieved the top 1000 documents per topic, scored using Anserini's default scoring function.

PL2c1.0

Results | Participants | Input | Appendix

  • Run ID: PL2c1.0
  • Participant: UB_BW
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 7b4988b929dedc3fd4f785ff21d16f02
  • Run description: For this run, we used Terrier-v5.2, an open source Information Retrieval (IR) platform. All the documents (title and abstracts) used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Block indexing was enabled so that we could deploy proximity search. Stopword removal was enabled and we used Terrier-v5.2 stopword list. We used PL2 Divergence from Randomness term weighting model in Terrier-v5.2 IR platform to score and rank the documents. The hyper-parameter for PL2 was set to its default value of b = 1.0. During retrieval we used only the query in the topic. As improvement, we used Markov Random Fields for Term Dependencies. We used the full dependence variant of the model, which models dependencies between adjacent query terms. In this work, we explore a window size of 4, to see what impact it has of the retrieval effectiveness.

PL2c1.0_Bo1

Results | Participants | Input | Appendix

  • Run ID: PL2c1.0_Bo1
  • Participant: UB_BW
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 1ebeba124db1ed778b7323f74ab5fccf
  • Run description: For this run, we used Terrier-v5.2, an open source Information Retrieval (IR) platform. All the documents (title and abstracts) used in this study were first pre-processed before indexing and this involved tokenising the text and stemming each token using the full Porter stemming algorithm. Stopword removal was enabled and we used Terrier-v5.2 stopword list. We used PL2 Divergence from Randomness term weighting model in Terrier-v5.2 IR platform to score and rank the documents. The hyper-parameter for PL2 was set to its default value of b = 1.0. During retrieval we used only the query in the topic. As improvement, We used the Terrier-4.0 Divergence from Randomness (DRF) Bose - Einstein 1 (Bo1) model for query expansion to select the 10 most informative terms from the top 3 ranked documents after the first pass retrieval (on the local collection). We then performed a second pass retrieval on the local collection with the new expanded query.

poznan_run1

Results | Participants | Input | Appendix

  • Run ID: poznan_run1
  • Participant: POZNAN
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 0ef76be8d7896a7fd5a57a4eda1cfcb2
  • Run description: The information retrieval process based on TF-IDF. IR tool - elasticsearch. Queries expended manually with terms from potentially relevant articles.

poznan_run2

Results | Participants | Input | Appendix

  • Run ID: poznan_run2
  • Participant: POZNAN
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 3fcb07ff1e873a461bf68cb2a3ac7413
  • Run description: The information retrieval process based on TF-IDF. IR tool - elasticsearch. Queries expended manually with terms from potentially relevant articles - a second set of query expansion terms.

poznan_run3

Results | Participants | Input | Appendix

  • Run ID: poznan_run3
  • Participant: POZNAN
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 427dd36aa37b8f5b1db84dc842364f82
  • Run description: The information retrieval process based on TF-IDF. IR tool - elasticsearch. Queries expended manually with terms from potentially relevant articles. Field weighing scheme applied.

PS-r1-bm25all

Results | Participants | Input | Appendix

  • Run ID: PS-r1-bm25all
  • Participant: PITTSCI
  • Track: Round 1
  • Year: 2020
  • Submission: 4/21/2020
  • Type: manual
  • MD5: da9f7fd2a6391d2dc4ecd3c7c3a4d384
  • Run description: The query was generated manually through human by reading the topic, question and narrative section of the topics file and find keywords from it. The query is then expanded through the Metamap API which extractes UMLS entities from the original query text and use similar layman, medical entities and variations from Consumer Health Vocabulary to expand the original query. The expanded query is then sent to Elastiscsearch engine which use tf-idf model to index and use BM25 retrieval model to rank the result. Currently, all the words weigh the same inside the query.

PS-r1-bm25medical

Results | Participants | Input | Appendix

  • Run ID: PS-r1-bm25medical
  • Participant: PITTSCI
  • Track: Round 1
  • Year: 2020
  • Submission: 4/21/2020
  • Type: manual
  • MD5: 3c39b769942b22073e41bdec9ca9daf2
  • Run description: The query was generated manually through human by reading the topic, question and narrative section of the topics file and find keywords from it. The query is then expanded through the Metamap API which extractes UMLS entities from the original query text and add similar medical entities to expand the original query. The expanded query is then sent to Elastiscsearch engine which use tf-idf model to index and use BM25 retrieval model to rank the result. Currently, all the words weigh the same inside the query.

PS-r1-bm25none

Results | Participants | Input | Appendix

  • Run ID: PS-r1-bm25none
  • Participant: PITTSCI
  • Track: Round 1
  • Year: 2020
  • Submission: 4/21/2020
  • Type: manual
  • MD5: 58252469854828de2cf5cd4682c8a84e
  • Run description: The query was generated manually through human by reading the topic, question and narrative section of the topics file and find keywords from it. The query is then sent to Elastiscsearch engine which use tf-idf model to index and use BM25 retrieval model to rank the result. Currently, all the words weigh the same inside the query.

RMITBFuseM2

Results | Participants | Input | Appendix

  • Run ID: RMITBFuseM2
  • Participant: RMITB
  • Track: Round 1
  • Year: 2020
  • Submission: 4/20/2020
  • Type: manual
  • MD5: 6cdc02b8dd10982f0eb4c11bf853530e
  • Run description: 10 human query variations for each topic narrative were produced. Each of these queries were fed into 16 Terrier retrieval models with no query expansion applied, fused with CombSUM. No buffs were applied to preference recently published articles.

RMITBM1

Results | Participants | Input | Appendix

  • Run ID: RMITBM1
  • Participant: RMITB
  • Track: Round 1
  • Year: 2020
  • Submission: 4/20/2020
  • Type: manual
  • MD5: a1c9e0821f884df63c41d7fb62f84632
  • Run description: 10 human query variations for each topic narrative were produced. Each of these queries were fed into 16 Terrier retrieval models with no query expansion applied, fused with CombSUM. These relevance scores were further minimaxed between 0 and 1, to be linearly combined with an exponential decay function for recency of publication based on the cord metadata, where docs older than 4 months are not given a buff in score.

RUIR-bm25-at-exp

Results | Participants | Input | Appendix

  • Run ID: RUIR-bm25-at-exp
  • Participant: RUIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: ea368fe83f83d94c2352c16a96baf0d7
  • Run description: RUIR-bm25-mt-exp We interpreted the Kaggle tasks[0] as descriptions of search tasks, and performed query expansion using keywords typical for a given search task. TREC topics were automatically classified into the most appropriate searchtask/kaggle task by indexing the fulltext of the tasks and ranking them based on the topic.query. The top result was selected as the classification. The keywords for expansion were automatically selected from the fulltext task descriptions on Kaggle. The words in that text were ranked by TF-IDF, and words that clearly are not about the topic were filtered (~1 per task, except for the sample task). Then the top n=10 words were selected. The keywords were curated from the fulltext task descriptions on Kaggle (between 10 to 22 keywords were identified for eac task). Then these keywords were used as expansion terms to enrich the query described in the topic by appending these keywords to the query. Using this enriched query we ranked the documents using Anserini bm25 (fulltext+title+abstract). [0] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks

RUIR-bm25-mt-exp

Results | Participants | Input | Appendix

  • Run ID: RUIR-bm25-mt-exp
  • Participant: RUIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 7c51ac7d016a8500b87cf65322a8663f
  • Run description: RUIR-bm25-mt-exp We interpreted the Kaggle tasks[0] as descriptions of search tasks, and performed query expansion using keywords typical for a given search task. TREC topics were manually classified into the most appropriate searchtask/kaggle task (in topic order---[2,0,3,0,3,7,3,7,6,5,4,5,0,0,0,0,7,5,5,1,0,0,1,1,1,7,7,3,3,3], referring to tasks in the order of the Kaggle tasks page accessed 4/23). The keywords for expansion were automatically selected from the fulltext task descriptins on Kaggle. The words in that text were ranked by TF-IDF, and words that clearly are not about the topic were filtered (~1 per task, except for the sample task). Then the top n=10 words were selected. These keywords were used as expansion terms to enrich the topic query of the corresponding task. This was done by appending these keywords to the query. Using this enriched query we ranked the documents using Anserini bm25 (fulltext+title+abstract). [0] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks

RUIR-doc2vec

Results | Participants | Input | Appendix

  • Run ID: RUIR-doc2vec
  • Participant: RUIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: b9df9c37d217f20bebd9aa22919842c1
  • Run description: Manual run RUIR-doc2vec We interpreted the Kaggle tasks[0] as descriptions of search tasks, and boosted documents relevant to the given search task. TREC topics were manually classified into the most appropriate searchtask/kaggle task (in topic order: [2,0,3,0,3,7,3,7,6,5,4,5,0,0,0,0,7,5,5,1,0,0,1,1,1,7,7,3,3,3], referring to tasks in the order of the Kaggle tasks page accessed 4/23). To find out which documents are relevant to which tasks, we trained a doc2vec model on the paper abstracts and fulltext task descriptions on Kaggle. We retrieved the top 1000 results using Anserini bm25 (fulltext+title+abstract), and reranked them based on the distance between a task description and paper abstract in doc2vec space. BM25 scores of the top 1000 documents were normalized to range from 0 to 1. The same happened for the distances between the paper abstracts and task descriptions. These two scores were then added. [0] https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/tasks

run1

Results | Participants | Input | Appendix

  • Run ID: run1
  • Participant: GUIR_S2
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: d56e16849d676166b3aacfca44db1f96
  • Run description: Initial ranking with BM25 and query over fulltext, re-ranking using SciBERT (trained on MS-MARCO) using only title and abstract with the question.

run2

Results | Participants | Input | Appendix

  • Run ID: run2
  • Participant: GUIR_S2
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 14b4806d1c67e876a9d17afe0618a2df
  • Run description: Initial ranking with BM25 and question over title and abstract re-ranking using SciBERT (trained on medical subset of MS-MARCO) using title and abstract with the question.

run3

Results | Participants | Input | Appendix

  • Run ID: run3
  • Participant: GUIR_S2
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: ce338293eae452a58aa36c5f027873bc
  • Run description: Initial ranking with BM25 and question over title and abstract re-ranking using SciBERT (trained on MS-MARCO) using title and abstract with the question. Filtered to 2020 articles only.

sab20.1.blind

Results | Participants | Input | Appendix

  • Run ID: sab20.1.blind
  • Participant: sabir
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: e2a462ec68db452032a117ad4ced09f5
  • Run description: Blind (pseudo) feedback run SMART vector run, Lnu docs and ltu queries. Run on JSON docs only. Initial run to find top 10 docs, consider them relevant and run Rocchio feedback, expanding topic to top 20 terms Initial full topics including narrative.

sab20.1.merged

Results | Participants | Input | Appendix

  • Run ID: sab20.1.merged
  • Participant: sabir
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: f658976f3f4b42b90cc3093cc40c2caf
  • Run description: Simple SMART vector run, Lnu docs and ltu queries. All document representations combined into one vector at indexing time (tf weighted vectors for metadata file plus all JSON representations all added together, and then reweighted as Lnu). Full topics including narrative.

sab20.1.meta.docs

Results | Participants | Input | Appendix

  • Run ID: sab20.1.meta.docs
  • Participant: sabir
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 33722480a6beaa7194fbb5bcc689dd57
  • Run description: Simple SMART vector run, Lnu docs and ltu queries. Separate inverted files for metadata and JSON docs. Final score = 1.5 * metadata score + JSON score. Full topics including narrative.

savantx_nist_run_1

Results | Participants | Input | Appendix

  • Run ID: savantx_nist_run_1
  • Participant: SavantX
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 0a71e410779059c635f037008dccd9f8
  • Run description: An automated process was scripted to extract topics from the supplied XML. The searches were automatically issued to SavantX PRO. The system employees a patented, unsupervised machine learning hyper-dimensional relationship analysis technology. This technology coupled with PRO's full stack AI outputted ranked results without any human intervention.

savantx_nist_run_2

Results | Participants | Input | Appendix

  • Run ID: savantx_nist_run_2
  • Participant: SavantX
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 731a0aa5ae0047fb04ce7e5229a28860
  • Run description: An automated process was scripted to extract topics from the supplied XML. The searches were automatically issued to SavantX PRO. The system employees a patented, unsupervised machine learning hyper-dimensional relationship analysis technology. This technology coupled with PRO's full stack AI outputted ranked results without any human intervention.

savantx_nist_run_3

Results | Participants | Input | Appendix

  • Run ID: savantx_nist_run_3
  • Participant: SavantX
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: a6c6cc101b08f67da5258a0afad8ac45
  • Run description: An automated process was scripted to extract topics from the supplied XML. The searches were automatically issued to SavantX PRO. The system employees a patented, unsupervised machine learning hyper-dimensional relationship analysis technology. This technology coupled with PRO's full stack AI outputted ranked results without any human intervention.

SFDC-23April-run1

Results | Participants | Input | Appendix

  • Run ID: SFDC-23April-run1
  • Participant: SFDC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 0a62c88dfc1130302cfe9aed9eeb9ec5
  • Run description: Semantic search using a novel methodology to generate substantial query-paragraph training data, combined with Sentence BERT. There was no manual processing.

SFDC-23April-run2

Results | Participants | Input | Appendix

  • Run ID: SFDC-23April-run2
  • Participant: SFDC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: cf5b519cbedd4e6f00a682c7893a8a56
  • Run description: Semantic search using a novel methodology to generate substantial query-paragraph training data, combined with Sentence BERT. There was no manual processing.

sheikh_bm25_all

Results | Participants | Input | Appendix

  • Run ID: sheikh_bm25_all
  • Participant: UMASS_CIIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: b1a85c58f0b4befe1e183b4055ff3816
  • Run description: We retrieved 50 documents using BM25 ranking algorithm. Our queries were a concatenation of query, question and narrative parts of the topic. On the 50 retrieved documents we applied search result diversification algorithm using explicit topic modeling. We found top 10 topic from the 50 retrieved documents using LDA. Then we used proportionality based diversification algorithm PM-2 proposed by Dang and Croft in SIGIR 2012 to diversify the ranked list. This algorithm finds a topic and then selects the document that best matches the topic.

sheikh_bm25_manual

Results | Participants | Input | Appendix

  • Run ID: sheikh_bm25_manual
  • Participant: UMASS_CIIR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: c8ef70a1664e41c8d8538f8c73bc1c4f
  • Run description: We manually constructed queries from query, question and narrative part by just selecting keywords.

SINEQUA

Results | Participants | Input | Appendix

  • Run ID: SINEQUA
  • Participant: Sinequa2
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: bf481b44258797f33c6b3537bbd5d75d
  • Run description: Results have been generated using Sinequa search engine relevance computation. We used our baseline computation, with additional configuration as well as query expansion

SinequaR1_1

Results | Participants | Input | Appendix

  • Run ID: SinequaR1_1
  • Participant: Sinequa
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: f33d2c072a5fabce06e1f5cdc7c5506f
  • Run description: Results have been generated using Sinequa search engine relevance computation. We used our baseline computation, that is the default settings, with no additonal fine tuning nor query expansion/enrichments.

SinequaR1_2

Results | Participants | Input | Appendix

  • Run ID: SinequaR1_2
  • Participant: Sinequa
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 2c2bed2b7aaddbadcba2e5f36209ccd0
  • Run description: Results have been generated using Sinequa search engine relevance computation. We used our baseline computation, with additional configuration as well as query expansion, to adapt to the covid-19 use case. This relevance computation is similar to the one we use at https://covidsearch.sinequa.com/app/covid-search/#/home

smith.bm25

Results | Participants | Input | Appendix

  • Run ID: smith.bm25
  • Participant: smith
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 38971e46186511f7b8b8b5768f66710d
  • Run description: BM25 run over as much text as was available. https://github.com/jjfiv/trec-covid/tree/round1-submit

smith.ql

Results | Participants | Input | Appendix

  • Run ID: smith.ql
  • Participant: smith
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 003a24338e8abce1ec18552b5b705974
  • Run description: QL run over as much text as was available. https://github.com/jjfiv/trec-covid/tree/round1-submit

smith.rm3

Results | Participants | Input | Appendix

  • Run ID: smith.rm3
  • Participant: smith
  • Track: Round 1
  • Year: 2020
  • Submission: 4/19/2020
  • Type: automatic
  • MD5: 64a92df23f8b90bf1b7692171b52e24e
  • Run description: RM3 run over as much text as was available. orig_weight = 0.3 fb_docs = 20 fb_terms = 100 https://github.com/jjfiv/trec-covid/tree/round1-submit

T5R1

Results | Participants | Input | Appendix

  • Run ID: T5R1
  • Participant: covidex
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: caf8b25915de239d4d2ba49a189e3d77
  • Run description: run1: BM25 retrieval with Anserini followed by T5-11B reranker trained on MS MARCO. Index is formed as title+abstract+paragraph. Anserini's CovidQueryGenerator was used to build queries.

T5R3

Results | Participants | Input | Appendix

  • Run ID: T5R3
  • Participant: covidex
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d62c266c6b3e8f46a48672670b1c70d0
  • Run description: run3: BM25 retrieval with Anserini followed bt T5-11B reranker trained on MS MARCO. Index is formed as title+paragraph. Anserini's CovidQueryGenerator was used to build queries.

tcs_ilabs_gg_r1

Results | Participants | Input | Appendix

  • Run ID: tcs_ilabs_gg_r1
  • Participant: tcs_ilabs_gg
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: d425cfa24fe77c747fc22a990c39136c
  • Run description: Manual Process: Selection of topic phrases for query Retrieval Model & Weighing Scheme : Trained word2vec model on dataset; stacked vector representation of query words; tf-idf weighted similarity computation with sentences; Document scores are generated based on sentence similarity scores

Technion-JPD

Results | Participants | Input | Appendix

  • Run ID: Technion-JPD
  • Participant: Technion
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 3d9fd520a4064c20ad14e481ae03f257
  • Run description: The tags served for queries. We applied Krovetz stemming to queries and documents and removed stopwords on the INQUERY list only from queries. The indri toolkit was used for all experiments. We used a standard unigram language model approach to retrieve an initial list of 1000 documents for a query. Then a passage-based document retrieval method is used to re-rank the initially retrieved list. For this run, we used the JPDm-max method which is a learning-to-rank-based document retrieval method that utilizes information induced from multiple passages. Specifically, the feature-based representation of the document's passages. We used fixed-length windows of 300 terms as passages.For more details, please refer to the following work [Sheetrit E, Shtok A, Kurland O (2020) A Passage-Based Approach to Learning to Rank Documents. Information Retrieval Journal 23, 159--186 (2020). https://doi.org/10.1007/s10791-020-09369-x]. The INEX dataset [Geva S, Kamps J, Lethonen M, Schenkel R, Thom JA, Trotman A (2010) Overview of the inex 2009 ad hoc track. In: Focused retrieval and evaluation, pp 4-25] was used to train the document learning-to-rank method which was then applied on the CORD19 collection. The following two-phase procedure was used to learn the model. We first randomly split the set of queries to train (80%) and validation (20%); the latter was used to set the hyper-parameters of the LTR method. Once the best parameter values were selected, a final model was learned using all the queries. NDCG@20 was the optimization criterion. To train and apply the document ranker, we used a linear RankSVM [Joachims T (2006) Training linear SVMs in linear time. In Proc. of KDD, pp 217 - 226]. The regularization parameter is set to a value in {0.0001, 0.01, 0.1}.

Technion-MEDMM

Results | Participants | Input | Appendix

  • Run ID: Technion-MEDMM
  • Participant: Technion
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 15710d481bf9b5f041ac209e5093e871
  • Run description: The tags served for queries. We applied Krovetz stemming to queries and documents and removed stopwords on the INQUERY list only from queries. The indri toolkit was used for all of our experiments. We used the maximum entropy divergence minimization model (MEDMM) which is highly effective pseudo-feedback-based query expansion approach [Lv, Y. & Zhai, C. 2014. Revisiting the divergence minimization feedback model. In Proc. of CIKM, pp. 1863-1866].‏ MEDMM was used to rank the entire collection. We set both the number of documents and the number of terms from which MEDMM is constructed to 50. The interpolation parameter that controls the weight of the original query model was set to 0.5. All other parameters were set to default values.

Technion-RRF

Results | Participants | Input | Appendix

  • Run ID: Technion-RRF
  • Participant: Technion
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 875e905a3439534443f2d81322c141be
  • Run description: The tags served for queries. We applied Krovetz stemming to queries and documents and removed stopwords on the INQUERY list only from queries. The indri toolkit was used for all of our experiments. We used the reciprocal rank fusion method [Cormack GV, Clarke CL, Buettcher S (2009) Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proc. of SIGIR, pp 758-759] to combine the rankings of Run1 and Run3. The model's free parameter was set to 60.

Tetralogie0Fr

Results | Participants | Input | Appendix

  • Run ID: Tetralogie0Fr
  • Participant: IRIT_LSIS_FR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: bc55fcadc2d8f09808f12044ff77dd60
  • Run description: Tetralogie0fr uses our Tetralogie System which was initially designed to mine textual documents such as scientific publications and patents. Here we focus on parts and consider phrases. Each topic is represented by a term set and at least one should occur in the document for it to be retrieved. There is no document re-ranking but rather documents are ordered as they occured.

Tetralogie1Fr

Results | Participants | Input | Appendix

  • Run ID: Tetralogie1Fr
  • Participant: IRIT_LSIS_FR
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: a10233dc8840dad707f35c07d955193a
  • Run description: Tetralogie1fr uses our Tetralogie System which was initially designed to mine textual documents such as scientific publications and patents. Here we focus on parts and consider phrases. Each topic is represented by a term set and at least one should occur in the document for it to be retrieved. The document ranking is based on the term frequency and the number of the topic terms in the document.

tm_lda400

Results | Participants | Input | Appendix

  • Run ID: tm_lda400
  • Participant: TM_IR_HITZ
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: af83c25d3868f70b27cf3529d29e36f2
  • Run description: The system models the whole corpus with LDA based topic model. We used only abstracts of the articles to fit 400 topics. We retrieve the documents by first obtaining the topic distribution of the question and narrative, and rank the papers in the corpus according to Jensen-Shannon divergence.

TMACC_SeTA_baseline

Results | Participants | Input | Appendix

  • Run ID: TMACC_SeTA_baseline
  • Participant: TMACC_SeTA
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 9f91713016995a35a07032fbb45ce00b
  • Run description: The submission was fully automatic. Topic and elements were Stop Word and POS tag filtered (using nltk) to produce a keyword list, that was expanded by querying a Word2Vec model (gensim) of the CORD-19 dataset for 10 most related terms. Expanded keyword list was encoded as a vector in a doc2vec model. This model was trained using gensim lib after pre-processing for phrase extraction with spacy, textacy and Abner NER for multi-word named entity. The 1000 most related doc were returned.

TU_Vienna_TKL_1

Results | Participants | Input | Appendix

  • Run ID: TU_Vienna_TKL_1
  • Participant: TU_Vienna
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 0657ab32a5953d9802a1c8430b4faf3d
  • Run description: re-ranked the top100 of bm25 with TKL (a novel transformer-kernel model for long documents, accepted at sigir20); TKL was trained only on msmarco-document training data from TREC DL 2019 (trec's annotation have not been used); this version uses glove word embeddings as base

TU_Vienna_TKL_2

Results | Participants | Input | Appendix

  • Run ID: TU_Vienna_TKL_2
  • Participant: TU_Vienna
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: d2476ee0394b2655e692ec214449171a
  • Run description: re-ranked the top100 of bm25 with TKL (a novel transformer-kernel model for long documents, accepted at sigir20); TKL was trained only on msmarco-document training data from TREC DL 2019 (trec's annotation have not been used); this version uses the bert embedding layer as base (but no bert attention layers)

UB_NLP_RUN_1

Results | Participants | Input | Appendix

  • Run ID: UB_NLP_RUN_1
  • Participant: UB_NLP
  • Track: Round 1
  • Year: 2020
  • Submission: 4/21/2020
  • Type: automatic
  • MD5: 6a1d33bcdaa611422ec67d9776069ada
  • Run description: We have implemented a composite traditional IR + state of the art NLP approach to develop our system. Below are the details 1. We have trained a NER on the NCBI disease corpus using bioBERT, and used the NER to capture disease names from document abstracts, based on which we created out initial "disease-document" index. 2. We generated extracts from the document body and indexed them using BM25 model. This is our second level index. 3. Finally, we perform question answering using BERT based on the extracts that's returned by step 2 4. Query preprocessing: We generate 2 kinds of queries: i) tokenized bag of words query ii) contextual query tokenized bag of words query: a) We tokenize a query and remove stopwords. If there is a mention of disease in the query, then we add the alternate names of the disease to he tokens. Example: query->"What is the incubation period of COVID-19?" tokens-> "incubation", "period", "COVID-19" enriched tokens-> "incubation", "period", "COVID-19", "ncovid", "ncov", "novel", "coronavirus", "19-covid" contextual query: a) We detect disease name(s) in the query and replicate the same query n times based on the number of alternate names the disease has in our data. Example query->"What is the incubation period of COVID-19?" replicated queries->"What is the incubation period of COVID-19?", "What is the incubation period of ncovid?", "What is the incubation period of novel coronavirus?", "What is the incubation period of 19-covid?", "What is the incubation period of ncov?" 5. Retrieving results: i)We maintain seperate indexes for seperate kidns of disease. We have different index for MERS, SARS and NCOVID. Based on the disease that is detected in the query, we filter out the non relevant documents. ii)We query the BM25 model for the disease using the query tokens(bag of words) and retrieve the top 4000 extracts iii)Based on the BM25 ranking of the extracts, we perform BERT question answering on the top 500 extracts. Note that extracts != document. A document is broken down into extracts of 256 words. iv) The BERT question answering outputs logits for each position of the sequence. Higher the logit more the probability of the extract containing the answer to the query. We rank extracts by the logit value of the starting word of the answer. A document score is generated by taking the max of all the extracts of the document for which a logit value is present. We rank the documents in decreasing score. In order to make the document count 1000, we add the remanining documents(until the count is 1000) returned by BM25, after scaling their scores with respect to the min logit score.

udel_fang_run1

Results | Participants | Input | Appendix

  • Run ID: udel_fang_run1
  • Participant: udel_fang
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: ffb3235dad903705a6897955723f5fcf
  • Run description: We merge the title, abstract, and paragraph of a document as a whole to build an index. We use all non-stop words from query and question fields as queries. F2EXP is used as the retrieval function.

udel_fang_run2

Results | Participants | Input | Appendix

  • Run ID: udel_fang_run2
  • Participant: udel_fang
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: c68d558398b1f39b14ff88b61c955896
  • Run description: We build two indices: one with only title and abstract and the other with the combination of all paragraphs. Narrative fields minus stopwords are used as queries. F2EXP is used as the retrieval function. CombSUM is used to merge the results using these indices. Top 10 results are re-ranked in descending order of the year they published in and the ties are broken by the scores from the previous step.

udel_fang_run3

Results | Participants | Input | Appendix

  • Run ID: udel_fang_run3
  • Participant: udel_fang
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 666988d05f328f3a8eef0ce7f979b3f1
  • Run description: We build two indices: one with only title and abstract and the other with the combination of all paragraphs. We use all non-stop words from query and question fields as queries. F2EXP is used as the retrieval function and we also apply the axiomatic approach to select semantically related terms for query expansion. CombSUM is used to merge the results using these indices.

UIowaS_Run1

Results | Participants | Input | Appendix

  • Run ID: UIowaS_Run1
  • Participant: UIowaS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: efe0f56829fb851e37fd91e0fc5add07
  • Run description: The dataset was reduced to those that had at least one occurrence of the following patterns: covid, Covid, corona, Corona, cov, CoV, CORONA, COVID, SARS, COV Also we limited the topic to the query and narrative fields ignoring the question field. We used Terrier as the retrieval system. All runs are 'automatic'. DPH_0.res: basic DPH model

UIowaS_Run2

Results | Participants | Input | Appendix

  • Run ID: UIowaS_Run2
  • Participant: UIowaS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: e6213ad52ec1b84e9d4c6488564e962c
  • Run description: The dataset was reduced to those that had at least one occurrence of the following patterns: covid, Covid, corona, Corona, cov, CoV, CORONA, COVID, SARS, COV Also we limited the topic to the query and narrative fields ignoring the question field. We used Terrier as the retrieval system. All runs are 'automatic'. BM25: default BM 25 model: c = 0.4

UIowaS_Run3

Results | Participants | Input | Appendix

  • Run ID: UIowaS_Run3
  • Participant: UIowaS
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: d919acbc8d25fbbdf5d78f604ff88d67
  • Run description: The dataset was reduced to those that had at least one occurrence of the following patterns: covid, Covid, corona, Corona, cov, CoV, CORONA, COVID, SARS, COV Also we limited the topic to the query and narrative fields ignoring the question field. We used Terrier as the retrieval system. All runs are 'automatic'. BM25-d: BM25 run with retrieval feedback based query expansion: top 10 docs top 5 terms

UIUC_DMG_setrank

Results | Participants | Input | Appendix

  • Run ID: UIUC_DMG_setrank
  • Participant: UIUC_DMG
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: 5bc4ab4f5ae86cb9b8ce594cd200d7f8
  • Run description: Entity-aware Dirichlet-smoothed Language Model based ranking model Title_weight: 16, abstract_weight: 6, fulltext weight_weight: 1, entity type weight: 6 Some manual processing is used in learning NER model

UIUC_DMG_setrank_re

Results | Participants | Input | Appendix

  • Run ID: UIUC_DMG_setrank_re
  • Participant: UIUC_DMG
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 3ce32914230fad07fb1e459341f7f64b
  • Run description: rank ensemble of four unsupervised relevance models NER model involves manual effort

UIUC_DMG_setrank_ret

Results | Participants | Input | Appendix

  • Run ID: UIUC_DMG_setrank_ret
  • Participant: UIUC_DMG
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 8d27b79c9cc0d2715efdc7a0d53ae3c0
  • Run description: LambdaRank with 8 features derived from 4 unsupervised relevance models. Each relevance model leverages weighted fields information and entity information. Entity is extracted using an ensemble of NER models

uogTrDPH_prox_QQ

Results | Participants | Input | Appendix

  • Run ID: uogTrDPH_prox_QQ
  • Participant: uogTr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 253a30e58482e3fe26edd4b4a0c1dd5a
  • Run description: pyTerrier, DPH DFR model plus DFR sequential dependence proximity; QQ tags.

uogTrDPH_QE

Results | Participants | Input | Appendix

  • Run ID: uogTrDPH_QE
  • Participant: uogTr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: 7261d84ca6337dffdafc6788b856f256
  • Run description: pyTerrier, DPH DFR model plus Bo1 query expansion; QQN tags.

uogTrDPH_QE_QQ

Results | Participants | Input | Appendix

  • Run ID: uogTrDPH_QE_QQ
  • Participant: uogTr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: automatic
  • MD5: bf933a65333bcf16b149ebc59feab22d
  • Run description: pyTerrier, DPH DFR model plus Bo1 query expansion; QQ tags.

UP-cqqrnd1

Results | Participants | Input | Appendix

  • Run ID: UP-cqqrnd1
  • Participant: unique_ptr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: ae32f134c1f8ef0b3446ed40b174573f
  • Run description: A weighted combination of queries and question n-grams extracted using universal sentence encoders (Cer et al., 2018)

UP-rrf5rnd1

Results | Participants | Input | Appendix

  • Run ID: UP-rrf5rnd1
  • Participant: unique_ptr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: eed6b472e7ec0563e201a57c8887fd8f
  • Run description: An unsupervised reciprocal rank fusion (Cormack et al., 2009) of 5 runs: - bag-of-words queries - sequential dependence queries - bag-of-words questions - sequential dependence questions - a weighted combination of queries and question n-grams extracted using universal sentence encoders (Cer et al., 2018)

UP-sdqrnd1

Results | Participants | Input | Appendix

  • Run ID: UP-sdqrnd1
  • Participant: unique_ptr
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: automatic
  • MD5: 097dd3ea1c74b743bc559bb60abadda4
  • Run description: Simple sequential dependence model (Metzler, 2005) applied to queries

wistud_bing

Results | Participants | Input | Appendix

  • Run ID: wistud_bing
  • Participant: wistud
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: c4d071c41405cc888d152935aad6964a
  • Run description: 44 participants undertook a manual task using used our experimental search system (with results returned from the Bing Search API). Upon completion of the task, participants selected two queries that they felt returned good results. This was repeated over three topics. Post experiment, rankings for each query were obtained, with the Borda method run over the rankings to obtain scores to highlight the queries that were broadly accepted by participants over each topic.

wistud_indri

Results | Participants | Input | Appendix

  • Run ID: wistud_indri
  • Participant: wistud
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: 7056c286810f527614da35345be581f9
  • Run description: 45 participants undertook a manual task using our experimental search system (using the collection indexed with Indri, using QL with mu=2500). Upon completion of the task, participants selected two queries that they felt returned good results. This was repeated over three topics. Post experiment, rankings for each query were obtained, with the Borda method run over the rankings to obtain scores to highlight the queries that were broadly accepted by participants over each topic.

wistud_noSearch

Results | Participants | Input | Appendix

  • Run ID: wistud_noSearch
  • Participant: wistud
  • Track: Round 1
  • Year: 2020
  • Submission: 4/23/2020
  • Type: manual
  • MD5: a4aa4d753be0649f66314b0b1e8ddba1
  • Run description: Results submitted here are the result of manual, crowdsourced run. Here, we asked 15 participants to provide two queries that they thought would likely return documents relevant to the ten topics that they were shown. Post experiment, rankings for each query were obtained, with the Borda method run over the rankings to obtain scores to highlight the queries that were broadly accepted by participants over each topic.

xj4wang_run1

Results | Participants | Input | Appendix

  • Run ID: xj4wang_run1
  • Participant: xj4wang
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: 8acc390ac36b9ae8a71492b6c59e236e
  • Run description: The retrieval model used is BMI (Baseline Model Implementation), provided as a starter by Gordon Cormack for the TREC 2015/2016 Total Recall Track, with human assessors in place of the server (manual processing). [1] In more detail: It uses the CAL (Continuous Active Learning) method, starting with 1 synthetic file created using the given topics, word for word. This method is described by Grossman and Cormack in [4]. Feature vectors are created using the BMI tools. [1] SofiaML is used as the learner. The weighting schemes were chosen heavily based on the work of Cormack and Grossman in [2]. Stopping conditions for manual labeling were chosen heavily based on the work of Grossman et al. in [3]. References: [1] https://cormack.uwaterloo.ca/trecvm/ [2] file:///C:/Users/Jean/Downloads/2600428.2609601.pdf [3] https://trec.nist.gov/pubs/trec25/papers/Overview-TR.pdf [4] https://cormack.uwaterloo.ca/caldemo/AprMay16_EdiscoveryBulletin.pdf

yn-r1-alltext

Results | Participants | Input | Appendix

  • Run ID: yn-r1-alltext
  • Participant: NI_CCHMC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: a116f2578224ea6f1928c994f264fbe4
  • Run description: In our earlier research we assembled state-of-the-art natural language processing (NLP), information retrieval, and machine learning technologies and developed an automated clinical trial eligibility screener. The system identifies and transforms relevant words or phrases (e.g., diseases, medications, procedures) in an article to medical terms using clinical terminologies including UMLS, SNOMED-CT, RxNorm and CPT codes. Assertion detection is then applied to convert the terms to the corresponding format. Finally, the system matches query terms manually derived from a topic with the extracted medical terms to identify relevant articles for the topic. Reference: Ni Y, Kennebeck S, Dexheimer JW, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. 2014; 21(5):776-784.

yn-r1-concepttext

Results | Participants | Input | Appendix

  • Run ID: yn-r1-concepttext
  • Participant: NI_CCHMC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: c208d3cce688eaeacc8b4c1c5c03ad9b
  • Run description: In our earlier research we assembled state-of-the-art natural language processing (NLP), information retrieval, and machine learning technologies and developed an automated clinical trial eligibility screener. The system identifies and transforms relevant words or phrases (e.g., diseases, medications, procedures) in an article to medical terms using clinical terminologies including UMLS, SNOMED-CT, RxNorm and CPT codes. Assertion detection is then applied to convert the terms to the corresponding format. Finally, the system matches query terms manually derived from a topic with the extracted medical terms to identify relevant articles for the topic. Reference: Ni Y, Kennebeck S, Dexheimer JW, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. 2014; 21(5):776-784.

yn-r1-hierarchy

Results | Participants | Input | Appendix

  • Run ID: yn-r1-hierarchy
  • Participant: NI_CCHMC
  • Track: Round 1
  • Year: 2020
  • Submission: 4/22/2020
  • Type: manual
  • MD5: b6ddae8fd68ad110c4aa97f928473235
  • Run description: In our earlier research we assembled state-of-the-art natural language processing (NLP), information retrieval, and machine learning technologies and developed an automated clinical trial eligibility screener. The system identifies and transforms relevant words or phrases (e.g., diseases, medications, procedures) in an article to medical terms using clinical terminologies including UMLS, SNOMED-CT, RxNorm and CPT codes. Assertion detection is then applied to convert the terms to the corresponding format. Finally, the system matches query terms manually derived from a topic with the extracted medical terms to identify relevant articles for the topic. Reference: Ni Y, Kennebeck S, Dexheimer JW, et al. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department. J Am Med Inform Assoc. 2014; 21(5):776-784.