Runs - Adhoc Video Search 2025¶

BLIP BLIP2 CLIP LaCLIP SLIP diffusion¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: BLIP BLIP2 CLIP LaCLIP SLIP diffusion
Participant: WHU-NERCMS
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 9132f6cfa8d3e46c38d35939ee46bfc7
Run description: 16:4:10:3:3:3

ccilab1¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: ccilab1
Participant: ccilab
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-20
Type: automatic
Task: trec2025-avs-main
MD5: 94223a9f27bd7d9b26222373aecbb4f3
Run description: This run is obtained by computing similarities between shots and each topic using OpenAI CLIP's image and text encoders.

clap¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: clap
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 218b44a5a549e2b4228575f70c3f201e
Run description: gpt-4.1-mini decomposes query into visual and (non-speech) audio components. Visual component is searched using SigLIP2-base-patch16-naflex embeddings and audio component is searched on CLAP embeddings. The normalized scores from both search techniques are added together for the final ranking. If the LLM decided there was no audio component, then only the SigLIP2 embeddings are used.

decomp¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: decomp
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: 9d02b728d3a0dbc6058437541a35b868
Run description: We extract SigLIP2-base-patch16-naflex embeddings at 1 keyframe per second. Each user query is decomposed into visual components with each component expanded to 100 variants using GPT-4.1-mini, and their text embeddings are averaged and merged into a single query vector. Initial retrieval is done directly using SigLIP similarity, returning the top 2,500 candidates. Each candidate shot is then evaluated 10 times using Phi-3.5-Vision, and the scores are averaged. The final results are re-ranked based on these aggregated judgments, and the top 1,000 are submitted.

fg-clip¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: fg-clip
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 99c2723085e2afab2f4646c9f065fc32
Run description: This run uses FG-CLIP embeddings to retrieve the most relevant keyframes. FG-CLIP is a fine-tuned version of OpenAI's clip-vit-base-patch32, trained on V3C1 keyframes with captions generated by Phi-3-Vision. The fine-tune training used a modified loss function for fine-grain token level comparison.

Fuse all sub-models¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: Fuse all sub-models
Participant: WHU-NERCMS
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 3b11bc9aa8e8948c180dd1335b19e8cc
Run description: Fuse all sub-models

gpt¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: gpt
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 825e4d3975ac1d1438a66abb5ae6bb7d
Run description: We extract SigLIP2-base-patch16-naflex embeddings at 1 keyframe per second. Each user query is expanded to 100 variants using GPT-4.1-mini, and their text embeddings are averaged into a single query vector. Initial retrieval is done directly using SigLIP similarity, returning the top 2,500 candidates. Each candidate shot is then evaluated 3 times using GPT-4.1-mini, and the scores are averaged. The final results are re-ranked based on these aggregated judgments, and the top 1,000 are submitted.

HPA¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: HPA
Participant: WHU-NERCMS
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 557faae54948c29961c030be5dbe23cb
Run description: BEIT3 BLIP BLIP2 CLIP internVL LaCLIP SLIP diffusion

InternVL3 Baseline¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: InternVL3 Baseline
Participant: AFRL
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: manual
Task: trec2025-avs-main
MD5: 8c4492a49502aa210fdcc54e33cd361a
Run description: Modified InternVL3 VLLM with Basic Cosine Similarity distance measurement to establish baseline.

Paraphrase_T2V_VILA_NVILA_VideoLLaMA3¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: Paraphrase_T2V_VILA_NVILA_VideoLLaMA3
Participant: NII_UIT
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: manual
Task: trec2025-avs-main
MD5: ef713cd4afa144a64168eb57b99a4a6d
Run description: InternVL-G BEiT-3 CLIP-L/14 DataComp CLIP-H/14 Laion2B CLIP-H/14 DFN5b OpenAI RN101 BLIP-2 XCLIP InternVideo2 TeachCLIP Side4Video CLIP4Clip TS2Net VILA-1.5-40B NVILA-15B VideoLLaMA3-7B

phi-only¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: phi-only
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: a8eea8638d13e26a36f4adbd60e45c6b
Run description: We extract SigLIP2-base-patch16-naflex embeddings at 1 keyframe per second. Each user query is expanded to 100 variants using GPT-4.1-mini, and their text embeddings are averaged into a single query vector. Initial retrieval is done directly using SigLIP similarity, returning the top 2,500 candidates. Each candidate shot is then evaluated 10 times using Phi-3.5-Vision, and the scores are averaged. The final results are re-ranked based on these aggregated judgments, and the top 1,000 are submitted.

phi-subgroup¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: phi-subgroup
Participant: ncsu-las
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 157c1e2d8a956e4348065814f667579e
Run description: We extract SigLIP2-base-patch16-naflex embeddings at 1 keyframe per second. Each user query is expanded to 100 variants using GPT-4.1-mini, and their text embeddings are averaged into a single query vector. Initial retrieval is done directly using SigLIP similarity, returning the top 2,500 candidates. Each candidate shot is then evaluated 10 times using Phi-3.5-Vision, and the scores are averaged. An overlapping subgroup sort is then applied during re-ranking to limit how far each result can move from its initial rank, and the top 1,000 results are submitted.

Proportional fusion¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: Proportional fusion
Participant: WHU-NERCMS
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 22dc56889067adea5825e7578e559812
Run description: 6, 16, 4, 10, 5, 3, 3, 3

run1¶

Participants | Input | sample_eval | Appendix

Run ID: run1
Participant: SZUAI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: a6156168e8a308bf7df276800d356e6a
Run description: rerank run4

run2¶

Participants | Input | sample_eval | Appendix

Run ID: run2
Participant: SZUAI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: 06640621d51b1cab6363d484cdf527c1
Run description: IITV+owl

run3¶

Participants | Input | sample_eval | Appendix

Run ID: run3
Participant: SZUAI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: 37c050bf8290317957b59ef127179093
Run description: IITV+qwen2.5VL

run4¶

Participants | Input | sample_eval | Appendix

Run ID: run4
Participant: SZUAI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: 7e2837882ae9a8d0d8205cf2ca331499
Run description: overlap of IITV, owl3, and qwen

run_1¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: run_1
Participant: CERTH-ITI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: e63bf1a69bd9e456beece650bf4954e3
Run description: Textual queries are expanded using 20 rephrasings generated by the LLaMA 3.2 large language model to enrich semantic understanding. Retrieved results are re-ranked using cross-modal similarities computed by Qwen-VL 2.5 across a depth of 4000 videos. The similarities are normalized with respect to queries from 2022, 2023, and 2024.

run_2¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: run_2
Participant: CERTH-ITI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 70d2780a4059ef8ba62709660ade8b26
Run description: Textual queries are expanded using 20 rephrasings generated by the LLaMA 3.2 large language model to enrich semantic understanding. Retrieved results are re-ranked using cross-modal similarities computed by Qwen-VL 2.5 across a depth of 2000 videos. The similarities are normalized with respect to queries from 2022, 2023, and 2024.

run_3¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: run_3
Participant: CERTH-ITI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 6a83dcf688503a364cf675e02e3c6516
Run description: Textual queries are expanded using 20 rephrasings generated by the LLaMA 3.2 large language model to enrich semantic understanding. Retrieved results are re-ranked using cross-modal similarities computed by Qwen-VL 2.5 across a depth of 1000 videos. The similarities are normalized with respect to queries from 2022, 2023, and 2024.

run_4¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: run_4
Participant: CERTH-ITI
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 5ca4c9ff36f3b085420cf867b623f1fd
Run description: Textual queries are expanded using 20 rephrasings generated by the LLaMA 3.2 large language model to enrich semantic understanding. No re-ranking is applied. The similarities are normalized with respect to queries from 2022, 2023, and 2024.

T2V_VILA_NVILA_VideoLLaMA3_Aria¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: T2V_VILA_NVILA_VideoLLaMA3_Aria
Participant: NII_UIT
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-29
Type: automatic
Task: trec2025-avs-main
MD5: d40d846e3af71fe8a1d2eec773880d2a
Run description: InternVL-G BEiT-3 CLIP-L/14 DataComp CLIP-H/14 Laion2B CLIP-H/14 DFN5b OpenAI RN101 BLIP-2 XCLIP InternVideo2 TeachCLIP Side4Video CLIP4Clip TS2Net VILA-1.5-40B NVILA-15B VideoLLaMA3-7B Aria-8x3.5B

T2V_VILA_NVILA_VideoLLaMA3_v2¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: T2V_VILA_NVILA_VideoLLaMA3_v2
Participant: NII_UIT
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 488040411b639f5fa1e616e0feea0623
Run description: InternVL-G BEiT-3 CLIP-L/14 DataComp CLIP-H/14 Laion2B CLIP-H/14 DFN5b OpenAI RN101 BLIP-2 XCLIP InternVideo2 TeachCLIP Side4Video CLIP4Clip TS2Net VILA-1.5-40B NVILA-15B VideoLLaMA3-7B

T2V_VILA_NVILA_VideoLLaMA3_weights¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: T2V_VILA_NVILA_VideoLLaMA3_weights
Participant: NII_UIT
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 90d016830abfb04740f597b0008726d7
Run description: InternVL-G BEiT-3 CLIP-L/14 DataComp CLIP-H/14 Laion2B CLIP-H/14 DFN5b OpenAI RN101 BLIP-2 XCLIP InternVideo2 TeachCLIP Side4Video CLIP4Clip TS2Net VILA-1.5-40B NVILA-15B VideoLLaMA3-7B

T2V_VILA_v2¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: T2V_VILA_v2
Participant: NII_UIT
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-28
Type: automatic
Task: trec2025-avs-main
MD5: 1ef3c2ffcfa4ee51368b046ee507123e
Run description: InternVL-G BEiT-3 CLIP-L/14 DataComp CLIP-H/14 Laion2B CLIP-H/14 DFN5b OpenAI RN101 BLIP-2 XCLIP InternVideo2 TeachCLIP Side4Video CLIP4Clip TS2Net VILA-1.5-40B

tv25_Meisei_A1¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: tv25_Meisei_A1
Participant: meisei
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: b22daa76f08d8d38bd68d4ce674d2911
Run description: We used a two-stage retrieval pipeline. In the first stage, we employed a pretrained embedding models such as CLIP to compute text–image similarity and retrieve relevant candidates. In the second stage, for tasks requiring fine-grained understanding (e.g., VQA), we applied a vision-language model (VLM) to perform detailed re-ranking or YES/NO verification.

tv25_Meisei_A2¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: tv25_Meisei_A2
Participant: meisei
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 42779b09506d3172fb54430a9fa46bdd
Run description: We used a two-stage retrieval pipeline. In the first stage, we employed a pretrained embedding models such as CLIP to compute text–image similarity and retrieve relevant candidates. In the second stage, for tasks requiring fine-grained understanding (e.g., VQA), we applied a vision-language model (VLM) to perform detailed re-ranking or YES/NO verification.

tv25_Meisei_A3¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: tv25_Meisei_A3
Participant: meisei
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 1a061d778cccba5241f934f72e025220
Run description: We used a two-stage retrieval pipeline. In the first stage, we employed a pretrained embedding models such as CLIP to compute text–image similarity and retrieve relevant candidates. In the second stage, for tasks requiring fine-grained understanding (e.g., VQA), we applied a vision-language model (VLM) to perform detailed re-ranking or YES/NO verification.

tv25_Meisei_A4¶

Participants | Proceedings | Input | sample_eval | Appendix

Run ID: tv25_Meisei_A4
Participant: meisei
Track: Adhoc Video Search
Year: 2025
Submission: 2025-07-27
Type: automatic
Task: trec2025-avs-main
MD5: 99fc92cb3f493b7f8bc9940db3a33590
Run description: We used a two-stage retrieval pipeline. In the first stage, we employed a pretrained embedding models such as CLIP to compute text–image similarity and retrieve relevant candidates. In the second stage, for tasks requiring fine-grained understanding (e.g., VQA), we applied a vision-language model (VLM) to perform detailed re-ranking or YES/NO verification.