Proceedings - Federated Web Search 2013¶

Overview of the TREC 2013 Federated Web Search Track¶

Thomas Demeester, Dolf Trieschnigg, Dong Nguyen, Djoerd Hiemstra

Paper: http://trec.nist.gov/pubs/trec22/papers/FEDERATED.OVERVIEW.pdf

Abstract

The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb 2013. The focus was on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the relevance judgments for the test topics, the participants' individual approaches and results on both tasks are discussed. Promising research directions and an outlook on the 2014 edition of the track are provided as well.

Bibtex

@inproceedings{DBLP:conf/trec/DemeesterTNH13,
    author = {Thomas Demeester and Dolf Trieschnigg and Dong Nguyen and Djoerd Hiemstra},
    editor = {Ellen M. Voorhees},
    title = {Overview of the {TREC} 2013 Federated Web Search Track},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/FEDERATED.OVERVIEW.pdf},
    timestamp = {Tue, 24 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/DemeesterTNH13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Mirex and Taily at TREC 2013¶

Robin Aly, Djoerd Hiemstra, Dolf Trieschnigg, Thomas Demeester

Participant: ut
Paper: http://trec.nist.gov/pubs/trec22/papers/lowlands-web-federated.pdf
Runs: utTailyM400 | utTailyNormM400

Abstract

We describe the participation of the Lowlands at the Web Track and the FedWeb track of TREC 2013. For the Web Track we used the Mirex Map-Reduce library with out-of-the-box approaches and for the FedWeb Track we adapted our shard selection method Taily for resource selection. Here, our results were above median and close to the maximum performance achieved.

Bibtex

@inproceedings{DBLP:conf/trec/AlyHTD13,
    author = {Robin Aly and Djoerd Hiemstra and Dolf Trieschnigg and Thomas Demeester},
    editor = {Ellen M. Voorhees},
    title = {Mirex and Taily at {TREC} 2013},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/lowlands-web-federated.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AlyHTD13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Collection and Document Language Models for Resource Selection¶

Krisztian Balog

Participant: UiS
Paper: http://trec.nist.gov/pubs/trec22/papers/uis-federated.pdf
Runs: UiSS | UiSP | UiSSP

Abstract

This paper describes our participation in the resource selection task of the Federated Web Search track at TREC 2013. We employ two general strategies, collection-centric and document-centric, formulated in a language modeling framework. Results show that the document-centric approach delivers solid performance.

Bibtex

@inproceedings{DBLP:conf/trec/Balog13,
    author = {Krisztian Balog},
    editor = {Ellen M. Voorhees},
    title = {Collection and Document Language Models for Resource Selection},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/uis-federated.pdf},
    timestamp = {Tue, 08 Sep 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/Balog13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

University of Padua at TREC 2013: Federated Web Search Track¶

Emanuele Di Buccio, Ivano Masiero, Massimo Melucci

Participant: UPD
Paper: http://trec.nist.gov/pubs/trec22/papers/upd-federated.pdf
Runs: UPDFW13sh | UPDFW13mu | UPDFW13rrsh | UPDFW13rrmu

Abstract

This paper reports on the participation of the University of Padua to the TREC 2013 Federated Web Search track. The objective was the experimental investigation in Federated Web Search setting of TWF·IRF, which is a recursive weighting scheme for resource selection. The experimental results show that the TWF component, that is peculiar of this scheme, is sufficient to obtain an effective search engine ranking in terms of NDCG@20 when compared with the baseline and the runs of other track participants.

Bibtex

@inproceedings{DBLP:conf/trec/BuccioMM13,
    author = {Emanuele Di Buccio and Ivano Masiero and Massimo Melucci},
    editor = {Ellen M. Voorhees},
    title = {University of Padua at {TREC} 2013: Federated Web Search Track},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/upd-federated.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BuccioMM13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

ICTNET at Federated Web Search Track 2013¶

Feng Guan, Yuanhai Xue, Xiaoming Yu, Yue Liu, Xueqi Cheng

Participant: ICTNET
Paper: http://trec.nist.gov/pubs/trec22/papers/ICTNET-federated.pdf
Runs: ICTNETRun1 | ICTNETRun2 | ICTNETRun3

Abstract

This paper is about work done for result merging task of TREC 2013 Federated Web Track. We introduce three methods for calculating score of documents. These methods are based on linear combination with scores of document fields. The distinction is different weight factors. Score of base line method is the combination with score of basic html fields. Page rank score is added in second method. Documents with lower score are filtered during the third method.

Bibtex

@inproceedings{DBLP:conf/trec/GuanXYLC13,
    author = {Feng Guan and Yuanhai Xue and Xiaoming Yu and Yue Liu and Xueqi Cheng},
    editor = {Ellen M. Voorhees},
    title = {{ICTNET} at Federated Web Search Track 2013},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/ICTNET-federated.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GuanXYLC13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion¶

André Mourão, Flávio Martins, João Magalhães

Participant: NOVASEARCH
Paper: http://trec.nist.gov/pubs/trec22/papers/novasearch-federated.pdf
Runs: nsCondor | nsISR | nsRRF

Abstract

We propose an unsupervised late-fusion approach for the results merging task, based on combining the ranks from all the search engines. Our idea is based on the known pressure for Web search engines to put the most relevant documents at the very top of their ranks and the intuition that relevance of a document should increase as it appears on more search engines [9]. We performed experiments with state-of-the-art rank fusion algorithms: RRF and Condorcet Fuse and our proposed method: Inverse Square Rank (ISR) fusion algorithm. Rank fusion algorithms have low computational complexity and do not need engines to return document scores nor training data. Inverse Square Rank is a novel fully unsupervised rank fusion algorithm based on quadratic decay and on logarithmic document frequency normalization. The results achieved in the competition were very positive and we were able to improve them further post-TREC.

Bibtex

@inproceedings{DBLP:conf/trec/MouraoMM13,
    author = {Andr{\'{e}} Mour{\~{a}}o and Fl{\'{a}}vio Martins and Jo{\~{a}}o Magalh{\~{a}}es},
    editor = {Ellen M. Voorhees},
    title = {NovaSearch at {TREC} 2013 Federated Web Search Track: Experiments with rank fusion},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/novasearch-federated.pdf},
    timestamp = {Thu, 14 May 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/MouraoMM13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

ISI at the TREC 2013 Federated task¶

Dipasree Pal, Mandar Mitra

Participant: isi_pal
Paper: http://trec.nist.gov/pubs/trec22/papers/isipal-federated.pdf
Runs: incgqd | incgqdv2 | mergv1

Abstract

The resource selection task contains a variety of Search Engines (SEs). There exists many domain specific SEs. These SEs can retrieve domain related query results more efficiently, whereas, they are not good at retrieving out-of-domain query results. Thus, it is difficult to predict the performance of a SE in a given query using the results of other queries. In our approach, each query has been searched in the web by all the SEs ( using Google search API's search site option ). We try to predict the performance of each SE with only top 8 retrieved results. Based on the term frequency of each query term in the retrieved results, our method ranks those SEs for that query. At the time of run submission, we missed a few queries. After that, we rank SEs for all queries using the same method. We observed that the result is better in nDCG@20 than the 'median' result. Also, when measured in ERR@20, the result is better in more queries. Median ERR@20, is substantially higher than our result for one query, this affects the average. In the results merging task, the same concept has been applied. Here also, we did not use the actual retrieved results, as it will take time after retrieval. The score of a SE found in the resource selection task, is used here. The documents retrieved by a SE are assigned a score that is a combination of its rank and the SE's score. This can be computed without retrieving actual results.

Bibtex

@inproceedings{DBLP:conf/trec/PalM13,
    author = {Dipasree Pal and Mandar Mitra},
    editor = {Ellen M. Voorhees},
    title = {{ISI} at the {TREC} 2013 Federated task},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/isipal-federated.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/PalM13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

CWI and TU Delft Notebook TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks¶

Alejandro Bellogín, Gebrekirstos G. Gebremeskel, Jiyin He, Alan Said, Thaer Samar, Arjen P. de Vries, Jimmy Lin, Jeroen B. P. Vuurens

Participant: CWI
Paper: http://trec.nist.gov/pubs/trec22/papers/cwi-context-federated-kba-web.pdf
Runs: cwi13ODPJac | cwi13SniTI | cwi13ODPTI | CWI13bstTODPJ | CWI13iaTODPJ | CWI13IndriQL

Abstract

This paper provides an overview of the work done at the Centrum Wiskunde & Informatica (CWI) and Delft University of Technology (TU Delft) for different tracks of TREC 2013. We participated in the Contextual Suggestion Track, the Federated Web Search Track, the Knowledge Base Acceleration (KBA) Track, and the Web Ad-hoc Track. In the Contextual Suggestion track, we focused on filtering the entire ClueWeb12 collection to generate recommendations according to the provided user profiles and contexts. For the Federated Web Search track, we exploited both categories from ODP and document relevance to merge result lists. In the KBA track, we focused on the Cumulative Citation Recommendation task where we exploited different features to two classification algorithms. For the Web track, we extended an ad-hoc baseline with a proximity model that promotes documents in which the query terms are positioned closer together.

Bibtex

@inproceedings{DBLP:conf/trec/BelloginGHSSVLV13,
    author = {Alejandro Bellog{\'{\i}}n and Gebrekirstos G. Gebremeskel and Jiyin He and Alan Said and Thaer Samar and Arjen P. de Vries and Jimmy Lin and Jeroen B. P. Vuurens},
    editor = {Ellen M. Voorhees},
    title = {{CWI} and {TU} Delft Notebook {TREC} 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks},
    booktitle = {Proceedings of The Twenty-Second Text REtrieval Conference, {TREC} 2013, Gaithersburg, Maryland, USA, November 19-22, 2013},
    series = {{NIST} Special Publication},
    volume = {500-302},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2013},
    url = {http://trec.nist.gov/pubs/trec22/papers/cwi-context-federated-kba-web.pdf},
    timestamp = {Fri, 27 Aug 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/BelloginGHSSVLV13.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}