Skip to content

Proceedings - Adhoc Video Search 2024

WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK

Heng Liu (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)Jiangshan He (NERCMS)Zeyuan Zhang (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)Yuanyuan Xu (NERCMS)Chao Liang (Hubei Key Laboratory of Multimedia, Network Communication Engineering, NERCMS, WHU)

Abstract

The WHU-NERCMS team participated in the ad-hoc video search (AVS) task of TRECVID 2024. In this year’s AVS task, we continued to use multiple visual semantic embedding methods, combined with interactive feedback-guided ranking aggregation techniques to integrate different models and their outputs to generate the final ranked video shot list. We submitted 4 runs each for automatic and interactive tasks, along with one attempt for a manual assistance task. Table 1 shows our results forthis year.

Bibtex
@inproceedings{WHU-NERCMS-trec2024-papers-proc-1,
    author = {Heng Liu (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)
Jiangshan He (NERCMS)
Zeyuan Zhang (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)
Yuanyuan Xu (NERCMS)
Chao Liang (Hubei Key Laboratory of Multimedia and Network Communication Engineering, NERCMS, WHU)},
    title = {WHU-NERCMS AT TRECVID2024: AD-HOC VIDEO SEARCH TASK},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {WHU-NERCMS},
    trec_runs = {run4, run3, run2, Manual_run1, relevance_feedback_run4, relevance_feedback_run1, auto_run1, rf_run2, RF_run3},
    trec_tracks = {avs}
   url = {https://trec.nist.gov/pubs/trec33/papers/WHU-NERCMS.avs.pdf}
}

ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024

Konstantinos Gkountakos, Damianos Galanopoulos, Antonios Leventakis, Georgios Tsionkis, Klearchos Stavrothanasopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris

Abstract

This report presents the overview of the runs related to Ad-hoc Video Search (AVS) and Activities in Extended Video (ActEV) tasks on behalf of the ITI-CERTH team. Our participation in the AVS task involves a collection of five cross-modal deep network architectures and numerous pre-trained models, which are used to calculate the similarities between video shots and queries. These calculated similarities serve as input to a trainable neural network that effectively combines them. During the retrieval stage, we also introduce a normalization step that utilizes both the current and previous AVS queries for revising the combined video shot-query similarities. For the ActEV task, we adapt our framework to support a rule-based classification to overcome the challenges of detecting and recognizing activities in a multi-label manner while experimenting with two separate activity classifiers.

Bibtex
@inproceedings{CERTH-ITI-trec2024-papers-proc-1,
    author = {Konstantinos Gkountakos, Damianos Galanopoulos, Antonios Leventakis, Georgios Tsionkis, Klearchos Stavrothanasopoulos, Konstantinos Ioannidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris},
    title = {ITI-CERTH participation in ActEV and AVS Tracks of TRECVID 2024},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {CERTH-ITI},
    trec_runs = {certh.iti.avs.24.main.run.1, certh.iti.avs.24.main.run.2, certh.iti.avs.24.main.run.3, certh.iti.avs.24.progress.run.1, certh.iti.avs.24.progress.run.2, certh.iti.avs.24.progress.run.3},
    trec_tracks = {avs.actev}
   url = {https://trec.nist.gov/pubs/trec33/papers/CERTH-ITI.avs.actev.pdf}
}

Xueyan Wang (Renmin University of China)Yang Du (Renmin University of China)Yuqi Liu (Renmin University of China)Qin Jin (Renmin University of China)

Abstract

This report presents our solution for the Ad-hoc Video Search (AVS) task of TRECVID 2024. Based on our baseline AVS model in TRECVID 2023, we further improve the searching performance by integrating multiple visual-embedding models, performing video captioning to be used for topic-to-caption searches, and applying a re-ranking strategy for top candidate search selection. Our submissions from our improved AVS model rank the 3rd in TRECVID AVS 2024 on mean average precision (mAP) in the main task, achieving the best run of 36.8.

Bibtex
@inproceedings{ruc_aim3-trec2024-papers-proc-1,
    author = {Xueyan Wang (Renmin University of China)
Yang Du (Renmin University of China)
Yuqi Liu (Renmin University of China)
Qin Jin (Renmin University of China)},
    title = {RUC_AIM3 at TRECVID 2024: Ad-hoc Video Search},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {ruc_aim3},
    trec_runs = {add_captioning, baseline, add_QArerank, add_captioning_QArerank, VTM and VTC for two model, VTC for two model, VTM for two model, VTM and VTC for videollama2 robust, VTM and VTC for two model primary, VTC for two model primary, VTM for two model primary, VTM and VTC for videollama2 primary},
    trec_tracks = {avs}
   url = {https://trec.nist.gov/pubs/trec33/papers/ruc_aim3.avs.pdf}
}

softbank-meisei-trec2024-papers-proc-2

Aiswariya Manoj Kumar(Softbank Corp.)Hiroki Takushima(Softbank Corp.)Yuma Suzuki(Softbank Corp.)Hayato Tanoue(Softbank Corp.)Hiroki Nishihara(Softbank Corp.)Yuki Shibata(Softbank Corp.)Haruki Sato(Agoop Corp.)Takumi Takada(SB Intuitions Corp.)Takayuki Hori(Softbank Corp.)Kazuya Ueki(Meisei Univ.)

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex
@inproceedings{softbank-meisei-trec2024-papers-proc-2,
    author = {Aiswariya Manoj Kumar(Softbank Corp.)
Hiroki Takushima(Softbank Corp.)
Yuma Suzuki(Softbank Corp.)
Hayato Tanoue(Softbank Corp.)
Hiroki Nishihara(Softbank Corp.)
Yuki Shibata(Softbank Corp.)
Haruki Sato(Agoop Corp.)
Takumi Takada(SB Intuitions Corp.)
Takayuki Hori(Softbank Corp.)
Kazuya Ueki(Meisei Univ.)},
    title = {softbank-meisei-trec2024-papers-proc-2},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {rag}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.rag.pdf}
}

Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks

Kazuya Ueki (Meisei University)Yuma Suzuki (SoftBank Corp.)Hiroki Takushima (SoftBank Corp.)Haruki Sato (Agoop Corp.)Takumi Takada (SB Intuitions Corp.)Aiswariya Manoj Kumar (SoftBank Corp.)Hayato Tanoue (SoftBank Corp.)Hiroki Nishihara (SoftBank Corp.)Yuki Shibata (SoftBank Corp.)Takayuki Hori (SoftBank Corp.)

Abstract

The Softbank-Meisei team participated in the ad-hoc video search (AVS) and video-to-text (VTT) tasks at TREC 2024. In this year's AVS task, we submitted four fully automatic systems for both the main and progress tasks. Our systems utilized pre-trained vision and language models, including CLIP, BLIP, and BLIP-2, along with several other advanced models. We also expanded the original query texts using text generation and image generation techniques to enhance data diversity. The integration ratios of these models were optimized based on results from previous benchmark test datasets. In this year's VTT, as last year, we submitted four main task methods using multiple model captioning, reranking, and generative AI for summarization. For the subtasks, we submitted three methods using the output of each model. Last year's test data for the main task showed improvements of about 0.04 points in CIDEr-D and about 0.03 points in SPICE, based on the indices we had on hand.

Bibtex
@inproceedings{softbank-meisei-trec2024-papers-proc-3,
    author = {Kazuya Ueki (Meisei University)
Yuma Suzuki (SoftBank Corp.)
Hiroki Takushima (SoftBank Corp.)
Haruki Sato (Agoop Corp.)
Takumi Takada (SB Intuitions Corp.)
Aiswariya Manoj Kumar (SoftBank Corp.)
Hayato Tanoue (SoftBank Corp.)
Hiroki Nishihara (SoftBank Corp.)
Yuki Shibata (SoftBank Corp.)
Takayuki Hori (SoftBank Corp.)},
    title = {Softbank-Meisei at TREC 2024 Ad-hoc Video Search and Video to Text Tasks},
    booktitle = {The Thirty-Third Text REtrieval Conference Proceedings (TREC 2024), Gaithersburg, MD, USA, November 15-18, 2024},
    series = {NIST Special Publication},
    volume = {xxx-xxx},
    publisher = {National Institute of Standards and Technology (NIST)},
    year = {2024},
    trec_org = {softbank-meisei},
    trec_runs = {SoftbankMeisei - Progress Run 1, SoftbankMeisei - Progress Run 2, SoftbankMeisei - Progress Run 3, SoftbankMeisei - Progress Run 4, SoftbankMeisei - Main Run 1, SoftbankMeisei - Main Run 2, SoftbankMeisei - Main Run 3, SoftbankMeisei - Main Run 4, rtask-bm25-colbert_faiss, rtask-bm25-rank_zephyr, rag_bm25-colbert_faiss-gpt4o-llama70b, ragtask-bm25-rank_zephyr-gpt4o-llama70b, agtask-bm25-colbert_faiss-gpt4o-llama70b, SoftbankMeisei_vtt_main_run1, SoftbankMeisei_vtt_main_run2, SoftbankMeisei_vtt_main_run3, SoftbankMeisei_vtt_main_run4, SoftbankMeisei_vtt_sub_run2, SoftbankMeisei_vtt_sub_run3, SoftbankMeisei_vtt_sub_run1},
    trec_tracks = {avs.vtt}
   url = {https://trec.nist.gov/pubs/trec33/papers/softbank-meisei.avs.vtt.pdf}
}