Skip to content

Proceedings - Video-To-Text 2024

softbank-meisei-trec2024-papers-proc-2

Aiswariya Manoj Kumar(Softbank Corp.)Hiroki Takushima(Softbank Corp.)Yuma Suzuki(Softbank Corp.)Hayato Tanoue(Softbank Corp.)Hiroki Nishihara(Softbank Corp.)Yuki Shibata(Softbank Corp.)Haruki Sato(Agoop Corp.)Takumi Takada(SB Intuitions Corp.)Takayuki Hori(Softbank Corp.)Kazuya Ueki(Meisei Univ.)

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex
@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    author = {Aiswariya Manoj Kumar(Softbank Corp.)
Hiroki Takushima(Softbank Corp.)
Yuma Suzuki(Softbank Corp.)
Hayato Tanoue(Softbank Corp.)
Hiroki Nishihara(Softbank Corp.)
Yuki Shibata(Softbank Corp.)
Haruki Sato(Agoop Corp.)
Takumi Takada(SB Intuitions Corp.)
Takayuki Hori(Softbank Corp.)
Kazuya Ueki(Meisei Univ.)},
    title = {softbank-meisei-trec2024-papers-proc-2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks

Kazuya Ueki (Meisei University)Yuma Suzuki (SoftBank Corp.)Hiroki Takushima (SoftBank Corp.)Haruki Sato (Agoop Corp.)Takumi Takada (SB Intuitions Corp.)Aiswariya Manoj Kumar (SoftBank Corp.)Hayato Tanoue (SoftBank Corp.)Hiroki Nishihara (SoftBank Corp.)Yuki Shibata (SoftBank Corp.)Takayuki Hori (SoftBank Corp.)

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex
@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    author = {Kazuya Ueki (Meisei University)
Yuma Suzuki (SoftBank Corp.)
Hiroki Takushima (SoftBank Corp.)
Haruki Sato (Agoop Corp.)
Takumi Takada (SB Intuitions Corp.)
Aiswariya Manoj Kumar (SoftBank Corp.)
Hayato Tanoue (SoftBank Corp.)
Hiroki Nishihara (SoftBank Corp.)
Yuki Shibata (SoftBank Corp.)
Takayuki Hori (SoftBank Corp.)},
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {avs.vtt}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf}
}