Skip to content

Proceedings - Spoken Document Retrieval 2000

Spoken Document Retrieval Track Slides

John S. Garofolo, J. Lard, Ellen M. Voorhees

Bibtex
@inproceedings{DBLP:conf/trec/GarofoloLV00,
    author = {John S. Garofolo and J. Lard and Ellen M. Voorhees},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Spoken Document Retrieval Track Slides},
    booktitle = {Proceedings of The Ninth Text REtrieval Conference, {TREC} 2000, Gaithersburg, Maryland, USA, November 13-16, 2000},
    series = {{NIST} Special Publication},
    volume = {500-249},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2000},
    url = {http://trec.nist.gov/pubs/trec9/sdrt9\_slides/index.htm},
    timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/GarofoloLV00.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The Thisl SDR System at TREC-9

Steve Renals, Dave Abberley

Abstract

This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connection-ist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. We report our results for development tests using the TREC-8 queries, and for the TREC-9 evaluation.

Bibtex
@inproceedings{DBLP:conf/trec/RenalsA00,
    author = {Steve Renals and Dave Abberley},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The Thisl {SDR} System at {TREC-9}},
    booktitle = {Proceedings of The Ninth Text REtrieval Conference, {TREC} 2000, Gaithersburg, Maryland, USA, November 13-16, 2000},
    series = {{NIST} Special Publication},
    volume = {500-249},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2000},
    url = {http://trec.nist.gov/pubs/trec9/papers/sheffield-sdr.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/RenalsA00.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Spoken Document Retrieval for TREC-9 at Cambridge University

Sue E. Johnson, Pierre Jourlin, Karen Sparck Jones, Philip C. Woodland

Abstract

This paper presents work done at Cambridge University for the TREC-9 Spoken Document Retrieval (SDR) track. The CU-HTK transcriptions from TREC-8 with Word Error Rate (WER) of 20.5% were used in conjunction with stopping, Porter stem-ming, Okapi-style weighting and query expansion using a contemporaneous corpus of newswire. A windowing/recombination strategy was applied for the case where story boundaries were unknown (SU) obtaining a final result of 38.8% and 43.0% Average Precision for the TREC-9 short and terse queries respec-tively. The corresponding results for the story boundaries known runs (SK) were 49.5% and 51.9%. Document expansion was used in the SK runs and shown to also be beneficial for SU under certain circumstances. Non-lexical information was generated, which although not used within the evaluation, should prove useful to enrich the transcriptions in real-world applications. Fi-nally, cross recogniser experiments again showed there is little performance degradation as WER increases and thus SDR now needs new challenges such as integration with video data.

Bibtex
@inproceedings{DBLP:conf/trec/JohnsonJJW00,
    author = {Sue E. Johnson and Pierre Jourlin and Karen Sparck Jones and Philip C. Woodland},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {Spoken Document Retrieval for {TREC-9} at Cambridge University},
    booktitle = {Proceedings of The Ninth Text REtrieval Conference, {TREC} 2000, Gaithersburg, Maryland, USA, November 13-16, 2000},
    series = {{NIST} Special Publication},
    volume = {500-249},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2000},
    url = {http://trec.nist.gov/pubs/trec9/papers/cuhtk\_trec9.pdf},
    timestamp = {Wed, 05 May 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/JohnsonJJW00.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The LIMSI SDR System for TREC-9

Jean-Luc Gauvain, Lori Lamel, Claude Barras, Gilles Adda, Yannick de Kercadio

Abstract

In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year's system is able to index the audio data without knowledge of the story boundaries using a double windowing approach. The query expansion procedure of the information retrieval component has been revised and makes use of contemporaneous text sources. Experimental results are reported in terms of mean average precision for both the TREC SDR'99 and SDR'00 queries using the same 557h data set. The mean average precision of this year's system is 0.5250 for SDR'99 and 0.3706 for SDR'00 for the focus unknown story boundary condition with a 20% word error rate.

Bibtex
@inproceedings{DBLP:conf/trec/GauvainLBAK00,
    author = {Jean{-}Luc Gauvain and Lori Lamel and Claude Barras and Gilles Adda and Yannick de Kercadio},
    editor = {Ellen M. Voorhees and Donna K. Harman},
    title = {The {LIMSI} {SDR} System for {TREC-9}},
    booktitle = {Proceedings of The Ninth Text REtrieval Conference, {TREC} 2000, Gaithersburg, Maryland, USA, November 13-16, 2000},
    series = {{NIST} Special Publication},
    volume = {500-249},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2000},
    url = {http://trec.nist.gov/pubs/trec9/papers/limsi-sdr00.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GauvainLBAK00.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}