Runs - Podcast 2020¶
2306987O_abs_run1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: 2306987O_abs_run1
- Participant: UoGTr
- Track: Podcast
- Year: 2020
- Submission: 8/24/2020
- Type: automatic
- Task: summarization
- Run description: For this run, a pre-trained T5 model that was fine-tuned on the provided episode descriptions was used to generate the summaries. As part of the summary generation pipeline, some post-processing was done on the model's outputs to remove as much promotional material (links, hashtags etc) as possible.
2306987O_extabs_run2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: 2306987O_extabs_run2
- Participant: UoGTr
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: summarization
- Run description: For this run, the first 15 sentences were extracted from the podcast transcript and fed as input to a T5 model that was fine-tuned/trained using the podcast transcripts and episode descriptions.
2306987O_extabs_run3¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: 2306987O_extabs_run3
- Participant: UoGTr
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: For this run, the podcast transcripts were first fed through an extractive pipeline to pick out the top 15 most representative sentences. This extractive pipeline used SpanBert to generate the embeddings of the text, and a K-means classifier to cluster those embeddings into 15 (number of desired sentences) clusters. The 15 sentences that constitute the output from this pipeline are those that are closest to the K-means cluster centroids. The output of the extractive pipeline is then given to a T5 model that was fine-tuned on podcast transcripts and their respective episode descriptions.
bartcnn¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: bartcnn
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: The model inference code was used out of the box from huggingface/transformers.
bartpodcasts¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: bartpodcasts
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: We fine-tuned the pretrained BART summarization model from huggingface/transformers using the first 1024 tokens of the transcripts as inputs and the descriptions as outputs.
BERT-DESC-Q¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: BERT-DESC-Q
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
f10d10202c6189f6ec9a2b8d5b192c20
- Run description: (1) Generate a pool of top 50 candidates with BM25 using the queries, (2) rerank topic description-segments pairs using BERT reranking model; The model pre-trained on MS MARCO passage reranking data (Nogueira et al) and fine-tuned on automatically generated questions - segments pairs.
BERT-DESC-S¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: BERT-DESC-S
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
e30b716ce266292d377aae5752e0c35c
- Run description: (1) Generate a pool of top 50 candidates with BM25 using the queries, (2) rerank topic description-segments pairs using BERT reranking model; The model pre-trained on MS MARCO passage reranking data (Nogueira et al) and fine-tuned on extra topics and relevance judgments from crowdsourcing.
BERT-DESC-TD¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: BERT-DESC-TD
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
ce584a7f5c82374a7edfb52ce3dc5771
- Run description: (1) Generate a pool of top 50 candidates with BM25 using the queries, (2) rerank topic description-segments pairs using BERT reranking model; The model pre-trained on MS MARCO passage reranking data (Nogueira et al) and fine-tuned on synthetic data from the podcast dataset. The top relevant segments within the episode were retrieved using the episode title as the query and the episode description-segments were used as reranking pairs.
BM25¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: BM25
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
8f43c4bb18e80cc8ef24794d3961678e
- Run description: Traditional IR model, BM25; implemented with Anserini toolkit and default parameters (b=0.9 and k=0.4)
categoryaware1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: categoryaware1
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: This run is after one epoch of fine-tuning.
categoryaware2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: categoryaware2
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: This run is after two epochs of fine-tuning.
coarse2fine¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: coarse2fine
- Participant: spotify
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: We used TextRank to extract central regions (chunks of sentences) of the transcript. The most central regions (up to about 1000 tokens) were concatenated in order of appearance and used as input for fine-tuning the Bart CNN/Daily Mail summarization model from huggingface/transformers, with episode descriptions as output. Output summaries were constrained to a maximum of 250 tokens.
cued_speechUniv1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: cued_speechUniv1
- Participant: cued_speechUniv
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: Two-step approach: (1) sentence filtering based on sentence-level attention scores of the hierarchical model, (2) BART summarisation using the filtered sentence as the input at both training & inference time. We optimised BART on maximum likelihood criterion and subsequently on reinforcement learning (sequence-level optimisation) criterion. Finally, we perform an ensemble of BART models from different checkpoints and different data shuffles.
cued_speechUniv2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: cued_speechUniv2
- Participant: cued_speechUniv
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: same as Run1 (cued_speechUniv2) - difference being the ensemble consists of 3 models
cued_speechUniv3¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: cued_speechUniv3
- Participant: cued_speechUniv
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: - This is meant to be the most standard approach (i.e. our baseline) - Fine-tuning CNN/Daily trained BART model on the podcast data - If the transcription at training and inference time exceeds 1,024 tokens, it gets truncated.
cued_speechUniv4¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: cued_speechUniv4
- Participant: cued_speechUniv
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: - same as RUN1 "cued_speechUniv1", with the difference being that this system is not optimised on the RL criterion and it is a single-model system rather than an ensemble
hk_uu_podcast1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: hk_uu_podcast1
- Participant: hk_uu_podcast
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: The model was trained for 3 epochs and the best rogue2 checkpoint on a created validation split was chosen. The model was trained using and input sequence length of 4096 and a target max length of 200.
hltcoe1¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: hltcoe1
- Participant: hltcoe
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
d2ee581babda85ed80422c954a8df344
- Run description: Statistical language model with linear interpolation. Rocchio-style relevance feedback and term reweighting. Overlapping, word-spanning, character 5-gram tokenization.
hltcoe2¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: hltcoe2
- Participant: hltcoe
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
9c86dc3b9eb1d0f49512f8d11ca9603c
- Run description: Statistical language model with linear interpolation. Rocchio-style relevance feedback and term reweighting. Unstemmed words used for tokenization.
hltcoe3¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: hltcoe3
- Participant: hltcoe
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
51b7a815b8a665f7e8dddc71327939d8
- Run description: Statistical language model with linear interpolation. No query modification or relevance feedback was employed. Unstemmed words used for tokenization.
hltcoe4¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: hltcoe4
- Participant: hltcoe
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
5b505126e2ac891314aa11124e7afacc
- Run description: Statistical language model with linear interpolation. Rocchio-style relevance feedback and term reweighting. Unstemmed words used for tokenization.
hltcoe5¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: hltcoe5
- Participant: hltcoe
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
b94954bf6db0fd88d8bf3f4df4f26b77
- Run description: Independently decoded audio data (baseline transcript was not used). Statistical language model with linear interpolation. Rocchio-style relevance feedback and term reweighting. Overlapping, word-spanning character 4-gram tokenization.
LRGREtvrs-r_1¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: LRGREtvrs-r_1
- Participant: LRG_REtrievers
- Track: Podcast
- Year: 2020
- Submission: 8/31/2020
- Type: automatic
- Task: retrieval
- MD5:
f8902c4b7a18df3a165e5d2f7fa0e8a9
- Run description: We ranked the user's query with BM25 scores for every podcast episode in the dataset and filtered the top 200 podcasts. We then divided the filtered podcasts into 2 minute segments and re-ranked them with a regressive XLNet model and returned the top 1000 results.
LRGREtvrs-r_2¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: LRGREtvrs-r_2
- Participant: LRG_REtrievers
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
a7c7e79b62c6c05f13afa50370b7f40e
- Run description: To tackle the problem statement, we adopt a neural re-ranking approach, using BM25 to filter episodes and RM3 for query expansion. We then split the episodes into two minute segments. For each query-segment pair in the training set, we use a transformers-based model. We then find the contextual embeddings using XLNet (keeping two layers unfrozen) and compute the similarity matrix between the query and the document, followed by kernel pooling techniques and linear layers to finally arrive at a relevance score for the document.
LRGREtvrs-r_3¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: LRGREtvrs-r_3
- Participant: LRG_REtrievers
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
25e2a11f66071c0d2e208529867ea3dc
- Run description: To tackle the problem statement, we adopt a neural re-ranking approach, we first split each episode into 2 minute segments. Then using BM25 to filter episodes and RM3 for query expansion we create a curated list of 5000 segments. For each query-segment pair in the training set, we use a regression based transformer model, after which we re-rank the documents according to the regression scores.
LRGREtvrs-r_4¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: LRGREtvrs-r_4
- Participant: LRG_REtrievers
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
3490b3763a141e7a47637f941e7a38bc
- Run description: We ranked the user's query with BM25 scores for every podcast episode in the dataset and filtered the top 400 podcasts. We then divided the filtered podcasts into 2 minute segments and re-ranked them with a regressive XLNet model and returned the top 1000 results.
onemin¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: onemin
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: The first minute of the transcript is extracted and used as the summary.
oudalab1¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: oudalab1
- Participant: oudalab
- Track: Podcast
- Year: 2020
- Submission: 9/4/2020
- Type: automatic
- Task: retrieval
- MD5:
8ff5420f22d01244c61c617e8baf8347
- Run description: Using the above-mentioned method, this run was a trial run with a few data points. We used the top 10 closest segments based on distance to find the episodes to use for our BERT QA task. We then chose the top 3 answers with the lowest similarity scores according to BERT, removing duplicates.
QL¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: QL
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
29f529f98bbbb230c34347f19fc61217
- Run description: Traditional IR model, query likelihood; implemented with Anserini toolkit, and default hyperparameters (for Dirichlet smoothing μ = 1000).
RERANK-DESC¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: RERANK-DESC
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
178140590bd272270d8e1a52e9c04c5d
- Run description: (1) Generate a pool of top 50 candidates with BM25 using the queries, (2) rerank topic description-segments pairs using BERT reranking model pre-trained on MS MARCO passage reranking data (Nogueira et al). The model was used without any further fine-tuning.
RERANK-QUERY¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: RERANK-QUERY
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
75e41ba41563c7448b30378fb13f2386
- Run description: (1) Generate a pool of top 50 candidates with BM25 using the queries, (2) rerank topic query-segments pairs using BERT reranking model pre-trained on MS MARCO passage reranking data (Nogueira et al). The model was used without any further fine-tuning.
run_dcu1¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: run_dcu1
- Participant: DCU-ADAPT
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
e5eb3bb396ee53bb8894a10f77414a9b
- Run description: Nouns and proper nouns are identified automatically using Spacy natural language processing toolkit, and those words are added to queries. From the documents of 1st pass retrieval, words relevant to query nouns are identified using wordnet, and these words are added after being ranked using Robertson offer weight. The queries are processed by the DPH model, and further Bo1 query expansion model was applied.
run_dcu2¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: run_dcu2
- Participant: DCU-ADAPT
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
df6299db1ef8194672883331ae579edc
- Run description: Nouns and named entities are identified automatically using Spacy natural language processing toolkit, and those words are added to queries. The queries are processed by the DPH model, and further Bo1 query expansion model was applied.
run_dcu3¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: run_dcu3
- Participant: DCU-ADAPT
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
0be485c98b8ce2978fbee980e261519a
- Run description: Nouns and named entities are identified automatically using Spacy natural language processing toolkit, and those words are added to queries. From the documents of 1st pass retrieval, words relevant to query nouns are identified using wordnet, and these words are added after being ranked using Robertson offer weight. The queries are processed by the DPH model, and further Bo1 query expansion model was applied.
run_dcu4¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: run_dcu4
- Participant: DCU-ADAPT
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
6245881af107cb477116c70a8faa531e
- Run description: A collection of webtext was compiled using Google search engine. From the collection, terms relevant to queries were found using Robertson offer weight and added to the queries. Nouns and named entities from query description were also added to the queries.
run_dcu5¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: run_dcu5
- Participant: DCU-ADAPT
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: retrieval
- MD5:
b439b372162c454e6abc26926983bdc0
- Run description: This is a combination of all of the query extension approaches from the previous submissions.
textranksegments¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: textranksegments
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: We chunked the transcript into ~50 word segments (respecting sentence boundaries), and ran TextRank, using TF-IDF cosine similarity as the edge weights, with aggregate vertex degree as the centrality measure (not PageRank). Up to ~150 words from the top segments were selected for the summary, with segments kept in order. We specified a set of stopwords, consisting of the most common terms in the whole corpus, to be ignored.
textranksentences¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: textranksentences
- Participant: podcast_baselines
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: We split the transcript into sentences using spaCy, and ran TextRank, using TF-IDF cosine similarity as the edge weights, with aggregate vertex degree as the centrality measure (not PageRank). The top sentences, up to about 150 words in total, were selected for the summary, with sentences kept in the order they appear. We specified a set of stopwords, consisting of the most common terms in the whole corpus, to be ignored.
UCF_NLP1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: UCF_NLP1
- Participant: UCF_NLP
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: Our summarization system (UCF_NLP1) focuses on generating abstractive summaries from podcast transcripts. It employs an encoder-decoder model to condense the first few segments of the transcript into an abstractive summary.
UCF_NLP2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: UCF_NLP2
- Participant: UCF_NLP
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: Our summarization system (UCF_NLP2) focuses on generating abstractive summaries from podcast transcripts. It consists of an abstractor that employs an encoder-decoder model to compose summaries and an extractor that enhances content selection by identifying summary-worthy segments from lengthy transcripts and provide them as input to the abstractor.
udel_wang_zheng1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: udel_wang_zheng1
- Participant: udel_wang_zheng
- Track: Podcast
- Year: 2020
- Submission: 8/28/2020
- Type: automatic
- Task: summarization
- Run description: We build a model using the first 1024 tokens from the transcript and fine-tune it on distilBART-cnndm.
udel_wang_zheng2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: udel_wang_zheng2
- Participant: udel_wang_zheng
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: summarization
- Run description: We perform LDA on transcript to extract the topics covered in the episode, and then select top-scoring sentences for fine-tuning.
udel_wang_zheng3¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: udel_wang_zheng3
- Participant: udel_wang_zheng
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: summarization
- Run description: Select sentences for fine-tuning.
udel_wang_zheng4¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: udel_wang_zheng4
- Participant: udel_wang_zheng
- Track: Podcast
- Year: 2020
- Submission: 9/1/2020
- Type: automatic
- Task: summarization
- Run description: Combine our output from previous three submission and generate summary again.
UMD_ID_run4¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: UMD_ID_run4
- Participant: UMD_IR
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: retrieval
- MD5:
eb8c143b43c146995afb00785cfbdf47
- Run description: 7 systems (unstemmed LM, unstemmed LM+word2vec query expansion, stemmed weighted LM with stopwords, unstemmed TFIDF, stemmed LM+sdm, unstemmed 5min long segments LM, stemmed LM with documents expanded with metadata), each system re-reranked using either T5 or BERT and then combined into a single system.
UMD_IR_run1¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: UMD_IR_run1
- Participant: UMD_IR
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
b4a90a6bed2ffc141f49cb4962c7240d
- Run description: Baseline model prepared from Indri LM with sequential dependency model applied, the results are re-ranked using T5 BERT model trained using MS MARCO dataset.
UMD_IR_run2¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: UMD_IR_run2
- Participant: UMD_IR
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
da9e8e1e56a3b84dcaae8da759b1df2d
- Run description: Indri LM with sequential dependency model.
UMD_IR_run3¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: UMD_IR_run3
- Participant: UMD_IR
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: retrieval
- MD5:
75f5f2cf929466b63a7856511bf308ed
- Run description: 7 systems (unstemmed LM, unstemmed LM+word2vec query expansion, stemmed weighted LM with stopwords, unstemmed TFIDF, stemmed LM+sdm, unstemmed 5min long segments LM, stemmed LM with documents expanded with metadata ) combined into a single run, which is reranked using 3 MS MARCO trained models (T5 and BERT) and combined with the baseline run.
UMD_IR_run5¶
Results
| Participants
| Proceedings
| Input
| Summary
| Appendix
- Run ID: UMD_IR_run5
- Participant: UMD_IR
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: retrieval
- MD5:
23429c601ffc40adce398db7c3450e29
- Run description: CombMNZ Combination of the run1 - run4
unhtrema1¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: unhtrema1
- Participant: TREMA-UNH
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: A GAN model is used to generate abstractive summary of chunks of input texts. Sentence-transformer method is used to embed each of these summary lines as fixed length vectors. Another LSTM network is trained to output a summary embedding vector given input summary embedding vectors. Then the generated summary lines are sorted based on the cosine similarity of their embedding vector to this synthetic summary vector. Top k summary lines are chosen from these sorted lines as the overall summary. For this run, k=3 with max output sequence length of 15 for the GAN model. Input text is split into 10 parts each input chunk of 1000 words.
unhtrema2¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: unhtrema2
- Participant: TREMA-UNH
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: A GAN model is used to generate abstractive summary of chunks of input texts. Sentence-transformer method is used to embed each of these summary lines as fixed length vectors. Another LSTM network is trained to output a summary embedding vector given input summary embedding vectors. Then the generated summary lines are sorted based on the cosine similarity of their embedding vector to this synthetic summary vector. Top k summary lines are chosen from these sorted lines as the overall summary. For this run, k=10 with max output sequence length of 15 for the GAN model. Input text is split into 10 parts each input chunk of 1000 words.
unhtrema3¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: unhtrema3
- Participant: TREMA-UNH
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: A GAN model is used to generate abstractive summary of chunks of input texts. Sentence-transformer method is used to embed each of these summary lines as fixed length vectors. Another LSTM network is trained to output a summary embedding vector given input summary embedding vectors. Then the generated summary lines are sorted based on the cosine similarity of their embedding vector to this synthetic summary vector. Top k summary lines are chosen from these sorted lines as the overall summary. For this run, k=10 with max output sequence length of 20 for the GAN model. Input text is split into 100 parts each input chunk of 100 words.
unhtrema4¶
Results
| Participants
| Proceedings
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: unhtrema4
- Participant: TREMA-UNH
- Track: Podcast
- Year: 2020
- Submission: 9/3/2020
- Type: automatic
- Task: summarization
- Run description: A GAN model is used to generate abstractive summary of chunks of input texts. Sentence-transformer method is used to embed each of these summary lines as fixed length vectors. Another LSTM network is trained to output a summary embedding vector given input summary embedding vectors. Then the generated summary lines are sorted based on the cosine similarity of their embedding vector to this synthetic summary vector. Top k summary lines are chosen from these sorted lines as the overall summary. For this run, k=20 with max output sequence length of 20 for the GAN model. Input text is split into 100 parts each input chunk of 100 words.
UTDThesis1¶
Results
| Participants
| Input
| Summary (manual)
| Summary (rouge)
| Appendix
- Run ID: UTDThesis1
- Participant: UTDThesis
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: summarization
- Run description: These are the abstractive summaries generated by the Dialogue Action Tokenized T5-Transformer described above.
UTDThesis_Run1¶
Results
| Participants
| Input
| Summary
| Appendix
- Run ID: UTDThesis_Run1
- Participant: UTDThesis
- Track: Podcast
- Year: 2020
- Submission: 9/2/2020
- Type: automatic
- Task: retrieval
- MD5:
bf94389052d00735052d02c0c8324a8d
- Run description: This run collects 100 ranked documents for each query, using the previously described ranking method.