Skip to content

Runs - Podcast 2021

baseline-BM25

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: baseline-BM25
  • Participant: BASELINES
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 3dc8635299b59fe1058267fa48fdef30
  • Run description: Baseline using Pyserini BM25

baseline-BM25-D

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: baseline-BM25-D
  • Participant: BASELINES
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: retrieval
  • MD5: f3dd830db3c2e2a2af4296bdbc2513d1
  • Run description: Baseline using Pyserini BM25, including Description field

Baseline-oneminute

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Baseline-oneminute
  • Participant: BASELINES
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: First minute of podcast

baseline-QL-D

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: baseline-QL-D
  • Participant: BASELINES
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 13282e5078271250d5b481b378f7f130
  • Run description: Baseline run using Pyserini QL, including Description field

baseline-QL-Q

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: baseline-QL-Q
  • Participant: BASELINES
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 712e62a030bbdb6159b8233348d8d8f9
  • Run description: Baseline run using Pyserini QL

f_b25_coil

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: f_b25_coil
  • Participant: CFDA_CLIP
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: de79c36f26c4a07beefe01efc6cafdf0
  • Run description: encoding: transcripts only bm25 + tct-coil trained on passage ranking dataset, score fusion

f_b25_tct

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: f_b25_tct
  • Participant: CFDA_CLIP
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: ad83573c2736aff405006a86422185bf
  • Run description: encoding: transcripts only bm25 + tct trained on document ranking dataset, score fusion

f_coil_tct

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: f_coil_tct
  • Participant: CFDA_CLIP
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: b2364b98564c2dc00b7a54008de70c51
  • Run description: encoding: transcripts only tct-coil trained on psg ranking + tct trained on document ranking dataset, score fusion

Hotspot1

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Hotspot1
  • Participant: Spotify
  • Track: Podcast
  • Year: 2021
  • Submission: 9/5/2021
  • Type: automatic
  • Task: summarization
  • Run description: This run is based on hotspot detection. Each episode audio is split into clips, where each clip corresponds to one sentence from the transcript. Speech emotion recognition is then performed on each clip, and the clips with the highest emotion scores are selected as "hotspots" and added to the summarization. SentenceBERT model is used to generate embeddings for sentences within the first four minutes of each episode as well as a document embedding that sums up all sentence embeddings. A similarity score between each sentence embedding and the document embedding is calculated, and the sentence with the highest score is selected and inserted in the beginning of the summarization. Finally, the summarization is generated both in audio and text forms.

ms_mt5

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: ms_mt5
  • Participant: h2oloo
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: retrieval
  • MD5: c528fd2fde147b96b3367b78fb4e331d
  • Run description: Pyserini Default BM25 using segments. 6-3 sliding window MaxP with a monoT5-3B trained on MS-MARCO 1K (Query Format: (Q + D))

osc_tok_vec

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: osc_tok_vec
  • Participant: OSC
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: c1e11aa5a3379c796a6c71e675af6d09
  • Run description: Max normalized scores combined with Cosine similarity for embeddings in window.

osc_token

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: osc_token
  • Participant: OSC
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 4f2c86cafcfd659d53730ebfaa561565
  • Run description: Elasticsearch's combined_fields query with field boosting to prioritize transcript

osc_vec_tok

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: osc_vec_tok
  • Participant: OSC
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: a9fccc9240435ed70d29604f556baabb
  • Run description: Cosine similarity on SBERT embeddings for recall

osc_vector

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: osc_vector
  • Participant: OSC
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: f5030ce207ea53a6dda34d6ee744e43d
  • Run description: Cosine similarity on SBERT embeddings

PoliTO_100_32-128

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: PoliTO_100_32-128
  • Participant: PoliTO
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: In this run, we set the maximum number of selected sentences to 100. The abstractive summarization model has been limited to provide summaries with minimum and maximum lengths set to 32 and 128 respectively. Podcasts' transcripts have been used as input both for supervised extraction and for the abstractive summarization step. Episodes' descriptions of the training set have been used as references for the supervised selection and abstractive summarization (only during training). The audio files haven't been directly used, instead, we use the opensmile feature representations. Please note: The audio summaries have the same content for all our runs.

PoliTO_25_32-128

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: PoliTO_25_32-128
  • Participant: PoliTO
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: In this run, we set the maximum number of selected sentences to 25. The abstractive summarization model has been limited to provide summaries with minimum and maximum lengths set to 32 and 128 respectively. Podcasts' transcripts have been used as input both for supervised extraction and for the abstractive summarization step. Episodes' descriptions of the training set have been used as references for the supervised selection and abstractive summarization (only during training). The audio files haven't been directly used, instead, we use the opensmile feature representations. Please note: The audio summaries have the same content for all our runs.

PoliTO_50_32-128

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: PoliTO_50_32-128
  • Participant: PoliTO
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: In this run, we set the maximum number of selected sentences to 50. The abstractive summarization model has been limited to provide summaries with minimum and maximum lengths set to 32 and 128 respectively. Podcasts' transcripts have been used as input both for supervised extraction and for the abstractive summarization step. Episodes' descriptions of the training set have been used as references for the supervised selection and abstractive summarization (only during training). The audio files haven't been directly used, instead, we use the opensmile feature representations. Please note: The audio summaries have the same content for all our runs.

PoliTO_50_64-128

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: PoliTO_50_64-128
  • Participant: PoliTO
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: summarization
  • Run description: In this run, we set the maximum number of selected sentences to 50. The abstractive summarization model has been limited to provide summaries with minimum and maximum lengths set to 64 and 128 respectively. Podcasts' transcripts have been used as input both for supervised extraction and for the abstractive summarization step. Episodes' descriptions of the training set have been used as references for the supervised selection and abstractive summarization (only during training). The audio files haven't been directly used, instead, we use the opensmile feature representations. Please note: The audio summaries have the same content for all our runs.

s_tasb

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: s_tasb
  • Participant: CFDA_CLIP
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 616db4d91602953ae7c216c1a46f5b1f
  • Run description: encoding: transcripts only tas-b model trained on msmarco psg ranking dataset

s_tct

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: s_tct
  • Participant: CFDA_CLIP
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 99a091911d996d5c46cd9d65bd7f6dd1
  • Run description: encoding: transcripts only tct model trained on msmarco doc ranking dataset

theTuringTest1

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: theTuringTest1
  • Participant: theTuringTest
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: summarization
  • Run description: Used Feature engineering, including Unigram Model. Also used Metrics such as Rouge1, Rouge2, RougeL, and Meteor to get best possible extractive summary. Applied TOPSIS to rank each sentence based on selected features. Iteratively remove worst performing sentences until we can get the best scoring extractive summary.

theTuringTest2

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: theTuringTest2
  • Participant: theTuringTest
  • Track: Podcast
  • Year: 2021
  • Submission: 9/2/2021
  • Type: automatic
  • Task: summarization
  • Run description: Used Feature engineering, including Unigram Model. Also used Metrics such as Rouge1, Rouge2, RougeL, and Meteor to get best possible extractive summary. Applied TOPSIS to rank each sentence based on selected features. Iteratively remove worst performing sentences until we can get the best set of sentences. T5 Model is applied on this set to produce the final abstractive summary.

tp_mt5

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: tp_mt5
  • Participant: h2oloo
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: retrieval
  • MD5: db1cc52cbfd64b42a26603047fbadd6a
  • Run description: Pyserini Default BM25 using segments. 6-3 sliding window MaxP with a monoT5-3B trained on MS-MARCO 1K -> 2020 Trec Podcasts Topics Transcripts (Query Format: (Q + D))

tp_mt5_f1

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: tp_mt5_f1
  • Participant: h2oloo
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: retrieval
  • MD5: f1a265db63711673a69fff16d916766b
  • Run description: Pyserini Default BM25 using segments. 6-3 sliding window MaxP with a monoT5-3B trained on MS-MARCO 1K -> 2020 Trec Podcasts Topics Transcripts (Query Format: (Q + D)) Feature weight Yamnet 1

tp_mt5_f2

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: tp_mt5_f2
  • Participant: h2oloo
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 2c2a250869c4c1e1142bc9b900fbfd67
  • Run description: Pyserini Default BM25 using segments. 6-3 sliding window MaxP with a monoT5-3B trained on MS-MARCO 1K -> 2020 Trec Podcasts Topics Transcripts (Query Format: (Q + D)) Feature weight Yamnet 2

TUW_hybrid_cat

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: TUW_hybrid_cat
  • Participant: TU_Vienna
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 98eac69f6dc90146413637c6a1a53b8e
  • Run description: This run first combines a standard BM25 (Pyserini) run and our full TAS-B run (both top1000) and then applies a knowledge distilled DistilBERT_Cat Re-ranking model (https://huggingface.co/sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco) to generate the final ranking. For QE re-rankings we utilize a BERT based emotion classifier trained on go-emotions dataset and for QS re-rankings we utilize RoBERTA classifier trained on a argument/non-argument labeled dataset and combine this a simple dictionary-based subjectivity classifier.

TUW_hybrid_ws

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: TUW_hybrid_ws
  • Participant: TU_Vienna
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 83c31c51fff66efbe129409d392efea7
  • Run description: We use our publicly available checkpoint (https://huggingface.co/sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco) trained on MS MARCO passage collection v1 to encode the segments and generate a faiss index. We generate a bm25 sparse-index (Pyserini) and using both indices we follow a hybrid sparse-dense retrieval approach (Pyserini). For QE re-rankings we utilize a BERT based emotion classifier trained on go-emotions dataset and for QS re-rankings we utilize RoBERTA classifier trained on a argument/non-argument labeled dataset and combine this a simple dictionary-based subjectivity classifier.

TUW_tasb192_ann

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: TUW_tasb192_ann
  • Participant: TU_Vienna
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 1bab5f8d3dbe880723bf05324b587088
  • Run description: This TAS-Balanced trained model (based on DistilBERT) uses a compression layer at the end to produce 192 dimensional embeddings in fp16 (a 8x reduction to a default 768 dim output in fp32), we then indexed the vectors with HNSW (using 128 neighbors per vector). For inference we use ONNX runtime and BERT optimizations with fp16 (resulting vectors are also fp16). For QE re-rankings we utilize a BERT based emotion classifier trained on go-emotions dataset and for QS re-rankings we utilize RoBERTA classifier trained on a argument/non-argument labeled dataset and combine this a simple dictionary-based subjectivity classifier.

TUW_tasb_cat

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: TUW_tasb_cat
  • Participant: TU_Vienna
  • Track: Podcast
  • Year: 2021
  • Submission: 9/6/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 3db5f8ed432ba1850701077adcaeb031
  • Run description: We use our publicly available checkpoint (https://huggingface.co/sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco) of our TAS-Balanced trained DistilBERT dense retrieval model in a brute-force search configuration. For inference we use ONNX runtime and BERT optimizations with fp16 (resulting vectors are also fp16). For QE re-rankings we utilize a BERT based emotion classifier trained on go-emotions dataset and for QS re-rankings we utilize RoBERTA classifier trained on a argument/non-argument labeled dataset and combine this a simple dictionary-based subjectivity classifier.

UCL_audio_1

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: UCL_audio_1
  • Participant: podcast2021_ucl
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 089e960fb7967ffa1cd459a137254b30
  • Run description: This run trains three classification models based on a small manually labelled subset of the podcast dataset. The input are tree selected eGeMAPS or YAMNet features. Either Random Foreset or Support Vector Classifier with RBF kernel are trained for each emotion. The initial ranked list are reranked based on the probability predicted by each emotion classification model.

UCL_audio_2

Results | Participants | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: UCL_audio_2
  • Participant: podcast2021_ucl
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: be1a6979287516748fdb4dbd210153a0
  • Run description: This run uses both eGeMAPS and YAMNet features to create three mood metrics for each emotion. By labelling a small subset of the podcast dataset, an exploratory approach is employed on a case-by-case basis to establish preliminary, "proof of concept" mood metrics.

Unicamp1

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Unicamp1
  • Participant: Unicamp
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: mBART adapted to a Longformer version and trained on Portuguese and English podcasts transcripts and episode descriptions.

Unicamp2

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Unicamp2
  • Participant: Unicamp
  • Track: Podcast
  • Year: 2021
  • Submission: 9/4/2021
  • Type: automatic
  • Task: summarization
  • Run description: This is a multilingual longformer model capable of generating abstractive summarizations. This model is the mBART-50 model converted to a Longformer version and capable of processing 4096 input tokens. It was finetuned on the XL-SUM dataset (English and Portuguese) and then finetuned on podcast transcripts and episode descriptions (English and Portuguese)

Webis_pc_abstr

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_abstr
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: Trained Cola Model (https://github.com/google-research/google-research/tree/master/cola) unsupervised on 10,000h of podcast audio files. Use combined Cola embeddings and embeddings generated by pretrained Roberta model as basis to train a model on 1000 manually annotated podcast snippets to classify how 'entertaining' a snippet is. Retrieve 5 sentences from episode that are most entertaining. Add their surrounding sentences. Concatenate them all and use as input for distilbart model that is trained on the CNN summarization dataset. Audio summary: Audio clips of the sentences that get used as input for summarization model.

Webis_pc_bs

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_bs
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 8825f4ece431d4376b6c0cee2e185563
  • Run description: Retrieve using Elasticsearch implementation of BM25. No reranking. This run is just a baseline for our other runs.

Webis_pc_co_rob

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_co_rob
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: a5b0e524df19a5bf7e0a3d5a2ffb1c79
  • Run description: Retrieve using Elasticsearch implementation of BM25. Trained Cola Model (https://github.com/google-research/google-research/tree/master/cola) unsupervised on 10,000h of podcast audio files. Use combined Cola embeddings and embeddings generated by pretained Roberta model as basis to train a classifier for every feature (entertaining, subjective, discussion) on 1000 manually annotated podcast snippets. Rerank using the generated features.

Webis_pc_cola

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_cola
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 30f4a211c00990637a7af15eaf8bca6b
  • Run description: Retrieve using Elasticsearch implementation of BM25. Trained Cola Model (https://github.com/google-research/google-research/tree/master/cola) unsupervised on 10,000h of podcast audio files. Use embeddings generated by this model as basis to train a classifier for every feature (entertaining, subjective, discussion) on 1000 manually annotated podcast snippets. Rerank using the generated features.

Webis_pc_extr

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_extr
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: summarization
  • Run description: Trained Cola Model (https://github.com/google-research/google-research/tree/master/cola) unsupervised on 10,000h of podcast audio files. Use combined Cola embeddings and embeddings generated by pretrained Roberta model as basis to train a model (SVM) on 1000 manually annotated podcast snippets to classify how 'entertaining' a snippet is. Add all sentences from the episode to graph as nodes. Set starting weights to a value calculated from semantic similarity and the entertainment score of both sentences. Use Textrank algorithm to rank all sentences in episode. Use 10 highest ranked sentences as summary. Audio summary: Audio clips of extracted sentences.

Webis_pc_rob

Results | Participants | Proceedings | Input | Summary (QD) | Summary (QE) | Summary (QR) | Summary (QS) | Appendix

  • Run ID: Webis_pc_rob
  • Participant: Webis
  • Track: Podcast
  • Year: 2021
  • Submission: 9/3/2021
  • Type: automatic
  • Task: retrieval
  • MD5: 84ad9aa522099d1bfc442ae375c2aeb3
  • Run description: Retrieve using Elasticsearch implementation of BM25. Use embeddings generated by pretrained Roberta model as basis to train a classifier for every feature (entertaining, subjective, discussion) on 1000 manually annotated podcast snippets. Rerank using the generated features.