Runs - CrisisFACTs 2023¶

drdqn-all¶

Participants | Input | Appendix

Run ID: drdqn-all
Participant: DarthReca
Track: CrisisFACTs
Year: 2023
Submission: 8/29/2023
Type: automatic
MD5: 4a30cf93d6a34e4e3f47f19fe7c52ed8
Run description: The system makes use of a DQN for text retrieval, topic modeling (BERTopic) for the selected relevant texts, and an abstractive summarizer for each generated cluster (BART-large-CNN).

drdqn-notopic¶

Participants | Input | Appendix

Run ID: drdqn-notopic
Participant: DarthReca
Track: CrisisFACTs
Year: 2023
Submission: 8/30/2023
Type: automatic
MD5: c87a1f87207ba93f8eb7dd65a4307175
Run description: The retrieval system is based on a DQN and for each of the retrieved text an abstractive summarizer (BART-large-CNN) is applied.

Human_Info_Lab-FM-A¶

Participants | Proceedings | Input | Appendix

Run ID: Human_Info_Lab-FM-A
Participant: Human_Info_Lab
Track: CrisisFACTs
Year: 2023
Submission: 9/2/2023
Type: automatic
MD5: 3600804d62a7fb6140231aa97eecdb6b
Run description: In this system, the indicative terms provided in the user profile are extended by using the Keybert library. Then, the streams are filtered by using the extended indicative terms. After that, for each stream a set of facts is generated by using a clause-based approach to open information extraction called ClausIE (FM-A). Then, the generated facts are filtered again by using extended indicative terms. For assigning the importance to each fact, we calculate the closeness centrality of each fact in a graph which is generated based on the similarity of facts to each other and the similarity of facts to the extended indicative terms and queries from user profiles. Finally, the importance scores are scaled to [0,1] and the duplicates are dropped.

Human_Info_Lab-FM-B¶

Participants | Proceedings | Input | Appendix

Run ID: Human_Info_Lab-FM-B
Participant: Human_Info_Lab
Track: CrisisFACTs
Year: 2023
Submission: 9/2/2023
Type: automatic
MD5: 9a06072f64db28545c0eba003a13e059
Run description: In this system, the indicative terms provided in the user profile are extended by using the Keybert library. Then, the streams are filtered by using the extended indicative terms. After that, for each stream a set of facts are generated by using Constituency Parsing with a Self-Attentive Encoder (FM-B). Then, the generated facts are filtered again by using extended indicative terms. For assigning the importance to each fact, we calculate the closeness centrality of each fact in a graph which is generated based on the similarity of facts to each other and the similarity of facts to the extended indicative terms and queries from user profiles. Finally, the importance scores are scaled to [0,1] and the duplicates are dropped.

IDACCS_GPT3.5¶

Participants | Proceedings | Input | Appendix

Run ID: IDACCS_GPT3.5
Participant: IDACCS
Track: CrisisFACTs
Year: 2023
Submission: 8/31/2023
Type: automatic
MD5: da22cb97452a819a48d68469152af79a
Run description: We used GPT3.5 to generate a summary, then segmented and found the best matching factText for attribution

IDACCS_occams_extract¶

Participants | Proceedings | Input | Appendix

Run ID: IDACCS_occams_extract
Participant: IDACCS
Track: CrisisFACTs
Year: 2023
Submission: 8/30/2023
Type: automatic
MD5: 45b0e5d11557c2829d7e6e0e1dd66e64
Run description: occams is an extractive summarization system that approximately solves the bounded maximal coverage problem. We used bigrams with the LOG_COUNTS term weighting scheme.

IDACCS_occamsHybridGPT3.5¶

Participants | Proceedings | Input | Appendix

Run ID: IDACCS_occamsHybridGPT3.5
Participant: IDACCS
Track: CrisisFACTs
Year: 2023
Submission: 8/31/2023
Type: automatic
MD5: 764ac96d9e3bed89e391cde22c1bfb10
Run description: We use a hybrid approach which generates an extractive summary via occams and then uses GPT3.5 to generate a summary, which is a paraphrase of the occams extract.

ilp_mmr¶

Participants | Proceedings | Input | Appendix

Run ID: ilp_mmr
Participant: OHM
Track: CrisisFACTs
Year: 2023
Submission: 8/28/2023
Type: automatic
MD5: f9c89814685c92ad212c33409bb8cb44
Run description: The system consists of three successive components: 1: Lexical retrieval with BM25 (+ BO1 query expansion) based on indicative terms + query text (top 250 p. query) 2: Re-ranking with monoT5-large based on query text (top 50 p. query) 3: ILP-system for diversified sentence selection with respect to covered entities and MMR for re-ranking (top 150 - 200 stream items)

IRLabIITBHU_BM25_1¶

Participants | Proceedings | Input | Appendix

Run ID: IRLabIITBHU_BM25_1
Participant: IRLAB_IIT_BHU
Track: CrisisFACTs
Year: 2023
Submission: 8/31/2023
Type: manual
MD5: 9a35ff8be3f089e2a4e574022ff8b921
Run description: We are calculating importance using BM25. It uses enhances TF-IDF (Term Frequency-Inverse Document Frequency) model.

IRLabIITBHU_DFReeKLIM_1¶

Participants | Proceedings | Input | Appendix

Run ID: IRLabIITBHU_DFReeKLIM_1
Participant: IRLAB_IIT_BHU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: manual
MD5: 1e76a46f3e86800c9e7fe294a85bf20d
Run description: I'm using DFReeKLIM model. The Divergence from Randomness (DFR) models in information retrieval aim to estimate the importance of a term or a combination of terms in a document with respect to a query.

IRLabIITBHU_DFReeKLIM_2¶

Participants | Proceedings | Input | Appendix

Run ID: IRLabIITBHU_DFReeKLIM_2
Participant: IRLAB_IIT_BHU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: manual
MD5: e1e43ae63bf17c091753907912769653
Run description: I'm using DFReeKLIM model. The Divergence from Randomness (DFR) models in information retrieval aim to estimate the importance of a term or a combination of terms in a document with respect to a query.

llama¶

Participants | Input | Appendix

Run ID: llama
Participant: umd_hcil
Track: CrisisFACTs
Year: 2023
Submission: 8/30/2023
Type: automatic
MD5: 3cacd7daaaada168dace8e6df37766ab
Run description: This method uses a standard retrieval model to get a list of relevant sentences for each query. Then, for that query, we have a prompt that asks a transformer model to summarize the list of facts, ranked by their relevance to the query. This step produces a one-to-three-sentence summary for each query on each event-day pair. We then aggregate all the queries for a given event-day into a single document that we then ask GPT-3.5 to rewrite into a summary of the most critical content. We then the importance of facts for this event-day pair based on overlap with this summary.

llama_13b_chat¶

Participants | Proceedings | Input | Appendix

Run ID: llama_13b_chat
Participant: OHM
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: 3c0228530c40b91a9022563cd9ede162
Run description: The system consists of three successive components: 1: Lexical retrieval with BM25 (+ BO1 query expansion) based on indicative terms + query text (top 250 p. query) 2: Re-ranking with monoT5-large based on query text (top 50 p. query) 3: LLaMA-2 (13B chat model) extract and summarizes facts with respect to the query (top 10 p. query)

nm-gpt35¶

Participants | Proceedings | Input | Appendix

Run ID: nm-gpt35
Participant: NM
Track: CrisisFACTs
Year: 2023
Submission: 8/30/2023
Type: automatic
MD5: 4f51b15ced305acc0d150e891f683b8e
Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25+monoT5 in the retrieval step and GPT-3.5-turbo-16k in the LLM reasoning step. We use the top 30 documents from the retrieval step.

nm-gpt35-bm25¶

Participants | Proceedings | Input | Appendix

Run ID: nm-gpt35-bm25
Participant: NM
Track: CrisisFACTs
Year: 2023
Submission: 9/2/2023
Type: automatic
MD5: 09adc3751bb07d986a8d442482769fe6
Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25 in the retrieval step and GPT-3.5-turbo-16k in the LLM reasoning step. We use the top 10 documents from the retrieval step.

nm-gpt4¶

Participants | Proceedings | Input | Appendix

Run ID: nm-gpt4
Participant: NM
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: ddace1b8dae98e67a1b88fbb27d93e69
Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25+monoT5 in the retrieval step and GPT-4-8k in the LLM reasoning step. We use the top 10 documents from the retrieval step.

nut-kslab01¶

Participants | Proceedings | Input | Appendix

Run ID: nut-kslab01
Participant: nut-kslab
Track: CrisisFACTs
Year: 2023
Submission: 8/31/2023
Type: automatic
MD5: 826783be3eb090672bc9b54581a0ef93
Run description: leverages the BM25 model to process the CrisisFACTS dataset, identify relevant facts using queries, compute importance metrics

Siena.Baseline1¶

Participants | Proceedings | Input | Appendix

Run ID: Siena.Baseline1
Participant: SienaCLTeam
Track: CrisisFACTs
Year: 2023
Submission: 8/29/2023
Type: automatic
MD5: 96e07c0b0ab11ce9ee3e01aa078c44fe
Run description: This is a baseline run generated from the baseline script present in the CrisisFACTS GitHub repository.

Siena.FactTrigrams1¶

Participants | Proceedings | Input | Appendix

Run ID: Siena.FactTrigrams1
Participant: SienaCLTeam
Track: CrisisFACTs
Year: 2023
Submission: 8/29/2023
Type: manual
MD5: 8b66011c46d6a8b519d673f5984a8e21
Run description: The system uses the facts to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

Siena.WikiTrigrams1¶

Participants | Proceedings | Input | Appendix

Run ID: Siena.WikiTrigrams1
Participant: SienaCLTeam
Track: CrisisFACTs
Year: 2023
Submission: 8/29/2023
Type: automatic
MD5: 77b9de1d542999dd3dcef4ca671f8945
Run description: The system uses the wikipedia page associated with each event to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

Siena.WikiTrigrams2¶

Participants | Proceedings | Input | Appendix

Run ID: Siena.WikiTrigrams2
Participant: SienaCLTeam
Track: CrisisFACTs
Year: 2023
Submission: 8/29/2023
Type: automatic
MD5: e0796e545a46f596aa99fd6003df2fd0
Run description: The system uses the wikipedia page associated with each event to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

TorontoMU_Word2Vec_TFIDF¶

Participants | Input | Appendix

Run ID: TorontoMU_Word2Vec_TFIDF
Participant: V-TorontoMU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: 6d7beb2fb50de25dc1ad9b1e03f91af2
Run description: To produce the run, the code employs TF-IDF for vector representation based on word importance in documents and Word2Vec to derive context-aware embeddings; a combination of these approaches through weighted summation yields the final document rankings.

V-TorontoMU-DFReeKLIM¶

Participants | Input | Appendix

Run ID: V-TorontoMU-DFReeKLIM
Participant: V-TorontoMU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: d41d8cd98f00b204e9800998ecf8427e
Run description: The code leverages the pyTerrier library to perform information retrieval using the DFReeKLIM weighting model, indexing preprocessed documents from specific crisis datasets. The queries, derived from user profiles, are then processed and matched against the indexed corpus to retrieve and rank relevant documents based on the significance of terms in the document collection relative to their occurrence in the query.

V-TorontoMU-DFReeKLIM-v2¶

Participants | Input | Appendix

Run ID: V-TorontoMU-DFReeKLIM-v2
Participant: V-TorontoMU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: b6a6b6afe2940b4e01b7f6dacd0800a1
Run description: The code leverages the pyTerrier library to perform information retrieval using the DFReeKLIM weighting model, indexing preprocessed documents from specific crisis datasets. The queries, derived from user profiles, are then processed and matched against the indexed corpus to retrieve and rank relevant documents based on the significance of terms in the document collection relative to their occurrence in the query.

V-TorontoMU_SBERT_Semanti¶

Participants | Input | Appendix

Run ID: V-TorontoMU_SBERT_Semanti
Participant: V-TorontoMU
Track: CrisisFACTs
Year: 2023
Submission: 9/1/2023
Type: automatic
MD5: 72815f44bff004abd897edd261465bbb
Run description: Leveraging the 'paraphrase-distilroberta-base-v1' Sentence-BERT model, the run achieves high-dimensional semantic embeddings of the queries and facts, capturing intricate linguistic nuances. Through cosine similarity metrics, the result discerns the semantic proximity between these embeddings, culminating in a refined selection of the top 200 contextually-aligned texts for each query.

V-TorontoMU_USE_4¶

Participants | Input | Appendix

Run ID: V-TorontoMU_USE_4
Participant: V-TorontoMU
Track: CrisisFACTs
Year: 2023
Submission: 9/2/2023
Type: automatic
MD5: dc9ce762d20a12359f4469df39850929
Run description: The code leverages the Universal Sentence Encoder (USE) to transform both textual documents and user queries into dense vector embeddings. Utilizing cosine similarity, it assesses and ranks the semantic proximity between these embeddings, thereby identifying and prioritizing documents that are most relevant to user inquiries.