Skip to content

Runs - CrisisFACTs 2023

drdqn-all

Participants

  • Run ID: drdqn-all
  • Participant: DarthReca
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/29/2023
  • Type: automatic
  • MD5: 4a30cf93d6a34e4e3f47f19fe7c52ed8
  • Run description: The system makes use of a DQN for text retrieval, topic modeling (BERTopic) for the selected relevant texts, and an abstractive summarizer for each generated cluster (BART-large-CNN).

drdqn-notopic

Participants

  • Run ID: drdqn-notopic
  • Participant: DarthReca
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/30/2023
  • Type: automatic
  • MD5: c87a1f87207ba93f8eb7dd65a4307175
  • Run description: The retrieval system is based on a DQN and for each of the retrieved text an abstractive summarizer (BART-large-CNN) is applied.

Human_Info_Lab-FM-A

Participants

  • Run ID: Human_Info_Lab-FM-A
  • Participant: Human_Info_Lab
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/2/2023
  • Type: automatic
  • MD5: 3600804d62a7fb6140231aa97eecdb6b
  • Run description: In this system, the indicative terms provided in the user profile are extended by using the Keybert library. Then, the streams are filtered by using the extended indicative terms. After that, for each stream a set of facts is generated by using a clause-based approach to open information extraction called ClausIE (FM-A). Then, the generated facts are filtered again by using extended indicative terms. For assigning the importance to each fact, we calculate the closeness centrality of each fact in a graph which is generated based on the similarity of facts to each other and the similarity of facts to the extended indicative terms and queries from user profiles. Finally, the importance scores are scaled to [0,1] and the duplicates are dropped.

Human_Info_Lab-FM-B

Participants

  • Run ID: Human_Info_Lab-FM-B
  • Participant: Human_Info_Lab
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/2/2023
  • Type: automatic
  • MD5: 9a06072f64db28545c0eba003a13e059
  • Run description: In this system, the indicative terms provided in the user profile are extended by using the Keybert library. Then, the streams are filtered by using the extended indicative terms. After that, for each stream a set of facts are generated by using Constituency Parsing with a Self-Attentive Encoder (FM-B). Then, the generated facts are filtered again by using extended indicative terms. For assigning the importance to each fact, we calculate the closeness centrality of each fact in a graph which is generated based on the similarity of facts to each other and the similarity of facts to the extended indicative terms and queries from user profiles. Finally, the importance scores are scaled to [0,1] and the duplicates are dropped.

IDACCS_GPT3.5

Participants

  • Run ID: IDACCS_GPT3.5
  • Participant: IDACCS
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/31/2023
  • Type: automatic
  • MD5: da22cb97452a819a48d68469152af79a
  • Run description: We used GPT3.5 to generate a summary, then segmented and found the best matching factText for attribution

IDACCS_occams_extract

Participants

  • Run ID: IDACCS_occams_extract
  • Participant: IDACCS
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/30/2023
  • Type: automatic
  • MD5: 45b0e5d11557c2829d7e6e0e1dd66e64
  • Run description: occams is an extractive summarization system that approximately solves the bounded maximal coverage problem. We used bigrams with the LOG_COUNTS term weighting scheme.

IDACCS_occamsHybridGPT3.5

Participants

  • Run ID: IDACCS_occamsHybridGPT3.5
  • Participant: IDACCS
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/31/2023
  • Type: automatic
  • MD5: 764ac96d9e3bed89e391cde22c1bfb10
  • Run description: We use a hybrid approach which generates an extractive summary via occams and then uses GPT3.5 to generate a summary, which is a paraphrase of the occams extract.

ilp_mmr

Participants

  • Run ID: ilp_mmr
  • Participant: OHM
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/28/2023
  • Type: automatic
  • MD5: f9c89814685c92ad212c33409bb8cb44
  • Run description: The system consists of three successive components: 1: Lexical retrieval with BM25 (+ BO1 query expansion) based on indicative terms + query text (top 250 p. query) 2: Re-ranking with monoT5-large based on query text (top 50 p. query) 3: ILP-system for diversified sentence selection with respect to covered entities and MMR for re-ranking (top 150 - 200 stream items)

IRLabIITBHU_BM25_1

Participants

  • Run ID: IRLabIITBHU_BM25_1
  • Participant: IRLAB_IIT_BHU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/31/2023
  • Type: manual
  • MD5: 9a35ff8be3f089e2a4e574022ff8b921
  • Run description: We are calculating importance using BM25. It uses enhances TF-IDF (Term Frequency-Inverse Document Frequency) model.

IRLabIITBHU_DFReeKLIM_1

Participants

  • Run ID: IRLabIITBHU_DFReeKLIM_1
  • Participant: IRLAB_IIT_BHU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: manual
  • MD5: 1e76a46f3e86800c9e7fe294a85bf20d
  • Run description: I'm using DFReeKLIM model. The Divergence from Randomness (DFR) models in information retrieval aim to estimate the importance of a term or a combination of terms in a document with respect to a query.

IRLabIITBHU_DFReeKLIM_2

Participants

  • Run ID: IRLabIITBHU_DFReeKLIM_2
  • Participant: IRLAB_IIT_BHU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: manual
  • MD5: e1e43ae63bf17c091753907912769653
  • Run description: I'm using DFReeKLIM model. The Divergence from Randomness (DFR) models in information retrieval aim to estimate the importance of a term or a combination of terms in a document with respect to a query.

llama

Participants

  • Run ID: llama
  • Participant: umd_hcil
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/30/2023
  • Type: automatic
  • MD5: 3cacd7daaaada168dace8e6df37766ab
  • Run description: This method uses a standard retrieval model to get a list of relevant sentences for each query. Then, for that query, we have a prompt that asks a transformer model to summarize the list of facts, ranked by their relevance to the query. This step produces a one-to-three-sentence summary for each query on each event-day pair. We then aggregate all the queries for a given event-day into a single document that we then ask GPT-3.5 to rewrite into a summary of the most critical content. We then the importance of facts for this event-day pair based on overlap with this summary.

llama_13b_chat

Participants

  • Run ID: llama_13b_chat
  • Participant: OHM
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: 3c0228530c40b91a9022563cd9ede162
  • Run description: The system consists of three successive components: 1: Lexical retrieval with BM25 (+ BO1 query expansion) based on indicative terms + query text (top 250 p. query) 2: Re-ranking with monoT5-large based on query text (top 50 p. query) 3: LLaMA-2 (13B chat model) extract and summarizes facts with respect to the query (top 10 p. query)

nm-gpt35

Participants

  • Run ID: nm-gpt35
  • Participant: NM
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/30/2023
  • Type: automatic
  • MD5: 4f51b15ced305acc0d150e891f683b8e
  • Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25+monoT5 in the retrieval step and GPT-3.5-turbo-16k in the LLM reasoning step. We use the top 30 documents from the retrieval step.

nm-gpt35-bm25

Participants

  • Run ID: nm-gpt35-bm25
  • Participant: NM
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/2/2023
  • Type: automatic
  • MD5: 09adc3751bb07d986a8d442482769fe6
  • Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25 in the retrieval step and GPT-3.5-turbo-16k in the LLM reasoning step. We use the top 10 documents from the retrieval step.

nm-gpt4

Participants

  • Run ID: nm-gpt4
  • Participant: NM
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: ddace1b8dae98e67a1b88fbb27d93e69
  • Run description: We employ a two-step pipeline for constructing abstractive facts from social media and online news. The first step is a retrieval step that uses the pre-defined user queries to search for relevant documents. Then, we use the top-k documents to compose a prompt that is submitted to a large language model (LLM). The second step of our pipeline consists of using the LLM to summarize the most important facts given the top-k documents. This pipeline is executed for each event-day pair. In this run, we use BM25+monoT5 in the retrieval step and GPT-4-8k in the LLM reasoning step. We use the top 10 documents from the retrieval step.

nut-kslab01

Participants

  • Run ID: nut-kslab01
  • Participant: nut-kslab
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/31/2023
  • Type: automatic
  • MD5: 826783be3eb090672bc9b54581a0ef93
  • Run description: leverages the BM25 model to process the CrisisFACTS dataset, identify relevant facts using queries, compute importance metrics

Siena.Baseline1

Participants

  • Run ID: Siena.Baseline1
  • Participant: SienaCLTeam
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/29/2023
  • Type: automatic
  • MD5: 96e07c0b0ab11ce9ee3e01aa078c44fe
  • Run description: This is a baseline run generated from the baseline script present in the CrisisFACTS GitHub repository.

Siena.FactTrigrams1

Participants

  • Run ID: Siena.FactTrigrams1
  • Participant: SienaCLTeam
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/29/2023
  • Type: manual
  • MD5: 8b66011c46d6a8b519d673f5984a8e21
  • Run description: The system uses the facts to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

Siena.WikiTrigrams1

Participants

  • Run ID: Siena.WikiTrigrams1
  • Participant: SienaCLTeam
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/29/2023
  • Type: automatic
  • MD5: 77b9de1d542999dd3dcef4ca671f8945
  • Run description: The system uses the wikipedia page associated with each event to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

Siena.WikiTrigrams2

Participants

  • Run ID: Siena.WikiTrigrams2
  • Participant: SienaCLTeam
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 8/29/2023
  • Type: automatic
  • MD5: e0796e545a46f596aa99fd6003df2fd0
  • Run description: The system uses the wikipedia page associated with each event to gather a large set of trigrams. The trigrams are then scored against the queries to see which trigrams perform best. These trigrams are then added to the queries they scored well against and the baseline script is run using these expanded queries.

TorontoMU_Word2Vec_TFIDF

Participants

  • Run ID: TorontoMU_Word2Vec_TFIDF
  • Participant: V-TorontoMU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: 6d7beb2fb50de25dc1ad9b1e03f91af2
  • Run description: To produce the run, the code employs TF-IDF for vector representation based on word importance in documents and Word2Vec to derive context-aware embeddings; a combination of these approaches through weighted summation yields the final document rankings.

V-TorontoMU-DFReeKLIM

Participants

  • Run ID: V-TorontoMU-DFReeKLIM
  • Participant: V-TorontoMU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: d41d8cd98f00b204e9800998ecf8427e
  • Run description: The code leverages the pyTerrier library to perform information retrieval using the DFReeKLIM weighting model, indexing preprocessed documents from specific crisis datasets. The queries, derived from user profiles, are then processed and matched against the indexed corpus to retrieve and rank relevant documents based on the significance of terms in the document collection relative to their occurrence in the query.

V-TorontoMU-DFReeKLIM-v2

Participants

  • Run ID: V-TorontoMU-DFReeKLIM-v2
  • Participant: V-TorontoMU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: b6a6b6afe2940b4e01b7f6dacd0800a1
  • Run description: The code leverages the pyTerrier library to perform information retrieval using the DFReeKLIM weighting model, indexing preprocessed documents from specific crisis datasets. The queries, derived from user profiles, are then processed and matched against the indexed corpus to retrieve and rank relevant documents based on the significance of terms in the document collection relative to their occurrence in the query.

V-TorontoMU_SBERT_Semanti

Participants

  • Run ID: V-TorontoMU_SBERT_Semanti
  • Participant: V-TorontoMU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/1/2023
  • Type: automatic
  • MD5: 72815f44bff004abd897edd261465bbb
  • Run description: Leveraging the 'paraphrase-distilroberta-base-v1' Sentence-BERT model, the run achieves high-dimensional semantic embeddings of the queries and facts, capturing intricate linguistic nuances. Through cosine similarity metrics, the result discerns the semantic proximity between these embeddings, culminating in a refined selection of the top 200 contextually-aligned texts for each query.

V-TorontoMU_USE_4

Participants

  • Run ID: V-TorontoMU_USE_4
  • Participant: V-TorontoMU
  • Track: CrisisFACTs
  • Year: 2023
  • Submission: 9/2/2023
  • Type: automatic
  • MD5: dc9ce762d20a12359f4469df39850929
  • Run description: The code leverages the Universal Sentence Encoder (USE) to transform both textual documents and user queries into dense vector embeddings. Utilizing cosine similarity, it assesses and ranks the semantic proximity between these embeddings, thereby identifying and prioritizing documents that are most relevant to user inquiries.