Proceedings - Routing 1997¶
Okapi at TREC-6 Automatic ad hoc, VLC, routing, filtering and QSDR¶
Steve Walker, Stephen E. Robertson, Mohand Boughanem, Gareth J. F. Jones, Karen Sparck Jones
- Participant: City
- Paper: http://trec.nist.gov/pubs/trec6/papers/city_proc_auto.ps.gz
- Runs: city6r1 | city6r2
Abstract
The filtering work was essentially only a small extension of the routing task effort. The pool of merged routing queries was used, but query selection was based on maximizing (over the training data) each of the utility functions for each topic. Two triples of runs were submitted. Both these sets compared very favourably with other participants' results.
Bibtex
@inproceedings{DBLP:conf/trec/WalkerRBJJ97,
author = {Steve Walker and Stephen E. Robertson and Mohand Boughanem and Gareth J. F. Jones and Karen Sparck Jones},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Okapi at {TREC-6} Automatic ad hoc, VLC, routing, filtering and {QSDR}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {125--136},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/city\_proc\_auto.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/WalkerRBJJ97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Fusion Via Linear Combination for the Routing Problem¶
Christopher C. Vogt, Garrison W. Cottrell
- Participant: UCSD
- Paper: http://trec.nist.gov/pubs/trec6/papers/ucsd.ps.gz
- Runs: UCSDrt6
Abstract
A linear combination of scores from two different IR systems is used for the routing task, with one combination model being trained for each query. Despite a poor selection of component systems, the combination model performs on par with the better of the two systems, learning to ignore the worse system.
Bibtex
@inproceedings{DBLP:conf/trec/VogtC97,
author = {Christopher C. Vogt and Garrison W. Cottrell},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Fusion Via Linear Combination for the Routing Problem},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {661--666},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/ucsd.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/VogtC97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Verity at TREC-6: Out-of-the-Box and Beyond¶
Jan O. Pedersen, Craig Silverstein, Christopher C. Vogt
- Participant: Verity
- Paper: http://trec.nist.gov/pubs/trec6/papers/verity-trec6-corrected.ps.gz
- Runs: VrtyRT6
Abstract
The Verity Trec-6 entry focused on the performance of the built-in search facilities of the commercially available Verity engine and explored the impact of simple enhancements. The ad hoc results show that considerable improvements can be achieved through the application of standard and more experimental techniques. The routing results show that respectable performance can be achieved simply through careful parameter tuning.
Bibtex
@inproceedings{DBLP:conf/trec/PedersenSV97,
author = {Jan O. Pedersen and Craig Silverstein and Christopher C. Vogt},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Verity at {TREC-6:} Out-of-the-Box and Beyond},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {259--273},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/verity-trec6-corrected.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/PedersenSV97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Context-Based Statistical Sub-Spaces¶
Gregory B. Newby
- Participant: UNC-N
- Paper: http://trec.nist.gov/pubs/trec6/papers/newby-t6.ps.gz
- Runs: ispr1 | ispr2
Abstract
The technique described in this paper is similar to latent semantic indexing (LSI), although with some variation. Whereas LSI operates by performing a singular value decomposition (SVD) on a large term by document matrix of co-occurrence scores, the technique here operates by identifying eigenvectors and eigenvalues of a term by term matrix of correlation scores (derived from co-occurrence scores). The technique of identifying eigenvectors and eigenvalues from a correlation matrix is known as principal components analysis (PCA). Variations from the previous year's TREC work include work using sub-documents (paragraphs), and working with small sub-matrices consisting only of terms in a query, rather than working with all terms from the collection.
Bibtex
@inproceedings{DBLP:conf/trec/Newby97,
author = {Gregory B. Newby},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Context-Based Statistical Sub-Spaces},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {735--745},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/newby-t6.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/Newby97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval¶
Bojidar Mateev, Eugen Munteanu, Paraic Sheridan, Martin Wechsler, Peter Schäuble
- Participant: ETH
- Paper: http://trec.nist.gov/pubs/trec6/papers/ETH_notebook.ps.gz
- Runs: ETH6R1 | ETH6R2
Abstract
ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and English as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments focused on the improvement of the feature selection strategy, on query expansion using similarity thesauri, on the grouping of features and on the combination of different retrieval methods. For Chinese retrieval we continued to rely on character bi-grams for indexing instead of attempting to segment and identify individual words, and we introduced a new manually-constructed stopword list consisting of almost 1,000 Chinese words. Experiments in cross-language retrieval focused heavily on our approach using multilingual similarity thesauri but also included several runs using machine translation technol-ogy. Finally, for the spoken document retrieval track our work included the development of a simple speaker-independent phoneme recogniser and some innovations in our probabilistic retrieval functions to compensate for speech recognition errors.
Bibtex
@inproceedings{DBLP:conf/trec/MateevMSWS97,
author = {Bojidar Mateev and Eugen Munteanu and Paraic Sheridan and Martin Wechsler and Peter Sch{\"{a}}uble},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {{ETH} {TREC-6:} Routing, Chinese, Cross-Language and Spoken Document Retrieval},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {623--635},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/ETH\_notebook.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/MateevMSWS97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
CSIRO Routing and Ad-Hoc Experiments at TREC-6¶
Arkadi Kosmynin
- Participant: CSIRO
- Paper: http://trec.nist.gov/pubs/trec6/papers/csiro.ps.gz
- Runs: csiro97r1 | csiro97r2
Abstract
CSIRO stands for Commonwealth Scientific and Industrial Research Organization. It is the Australian Government's main research body. This is the first year CSIRO is taking part in TREC. We got involved in textual information retrieval research as a part of our activities in Resource Discovery Unit at the Research Data Network Co-operative Research Centre. The primary aim of our research in IR is improvement of the efficiency of resource discovery systems and networked information retrieval.
Bibtex
@inproceedings{DBLP:conf/trec/Kosmynin97,
author = {Arkadi Kosmynin},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {{CSIRO} Routing and Ad-Hoc Experiments at {TREC-6}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {455--460},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/csiro.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/Kosmynin97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Query Term Expansion based on Paragraphs of the Relevant Documents¶
Kai Ishikawa, Kenji Satoh, Akitoshi Okumura
- Participant: NEC
- Paper: http://trec.nist.gov/pubs/trec6/papers/NEC.ps.gz
- Runs: virtue3
Abstract
Recently, we studied the method of extracting terms that co-occurred with initial query terms in relevant paragraphs as query term expansion method. In our methods, paragraphs in the relevant documents are lanked by using initial query to extract terms from the upper ranked paragraphs with using term co-occurrence. Our method eases the difficulty by ranking paragraphs with the initial query. Without using term co-occurrence in paragraphs, we could achieve the highly accurate treatment of term co-occurrence by small calculation. The results of our system for TREC-6 routing test data, obtained by using the expanded queries generated by our query term generation method are compared with the results obtained by initial queries.
Bibtex
@inproceedings{DBLP:conf/trec/IshikawaSO97,
author = {Kai Ishikawa and Kenji Satoh and Akitoshi Okumura},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Query Term Expansion based on Paragraphs of the Relevant Documents},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {577--584},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/NEC.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/IshikawaSO97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Conceptual Indexing Using Thematic Representation of Texts¶
Boris V. Dobrov, Natalia V. Loukachevitch, Tatyana N. Yudina
- Participant: CIR-Russia
- Paper: http://trec.nist.gov/pubs/trec6/papers/CIR6ROU3.ps.gz
- Runs: cir6rou1
Abstract
We present the thesaurus-based indexing technology developed by the Center for Information Research under the Information System RUSSIA project. The technology is based on using basic properties of coherent text. Initially the technology was applied for automatic processing of Russian official (government) texts. Currently the instrument is adapted to process English texts for TREC-6 routing task.
Bibtex
@inproceedings{DBLP:conf/trec/DobrovLY97,
author = {Boris V. Dobrov and Natalia V. Loukachevitch and Tatyana N. Yudina},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Conceptual Indexing Using Thematic Representation of Texts},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {403--413},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/CIR6ROU3.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/DobrovLY97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Passage-Based Refinement (MultiText Experiements for TREC-6)¶
Gordon V. Cormack, Charles L. A. Clarke, Christopher R. Palmer, Samuel S. L. To
- Participant: Waterloo
- Paper: http://trec.nist.gov/pubs/trec6/papers/waterloo.ps.gz
- Runs: uwmt6r1 | uwmt6r0
Abstract
The MultiText system retrieves passages, rather than entire documents, that are likely to be relevant to a particular topic. For all runs, we used the reciprocal of the length of each passage as an estimate of its likely relevance and ranked accordingly. For the manual adhoc task we explored the limits of user interaction by judging some 13,000 documents based on retrieved passages. For the automatic adhoc task we used retrieved passages as a feedback source for new query terms. For the routing task we estimated probability of relevance from passage length and used this estimate to construct a compound (tiered) query which was used to rank the new data using passage length. For the Chinese track we indexed individual characters rather than segmented words or bigrams and used manually constructed queries and passage-length ranking. For the high precision track we performed judgements on passages using an interface similar to that used for the manual adhoc task. The Very Large Collection run was done on a network of four cheap computers using very simple manually constructed queries and passage-length ranking.
Bibtex
@inproceedings{DBLP:conf/trec/CormackCPT97,
author = {Gordon V. Cormack and Charles L. A. Clarke and Christopher R. Palmer and Samuel S. L. To},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Passage-Based Refinement (MultiText Experiements for {TREC-6)}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {303--319},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/waterloo.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/CormackCPT97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Using Clustering and SuperConcepts Within SMART: TREC 6¶
Chris Buckley, Mandar Mitra, Janet A. Walz, Claire Cardie
- Participant: Cornell
- Paper: http://trec.nist.gov/pubs/trec6/papers/cornell.ps.gz
- Runs: Cor6R2qtc | Cor6R1cc
Abstract
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 6, performing runs in the routing, ad-hoc, and foreign language environments, including cross-lingual runs. The major focus this year is on trying to maintain the balance of the query - attempting to ensure the various aspects of the original query are appropriately addressed, especially while adding expansion terms. Exactly the same procedure is used for foreign language environments as for English; our tenet is that good information retrieval techniques are more powerful than linguistic knowledge. We also give an interesting cross-lingual run, assuming that French and English are closely enough related so that a query in one language can be run directly on a collection in the other language by just 'correcting' the spelling of the query words. This is quite successful for most queries.
Bibtex
@inproceedings{DBLP:conf/trec/BuckleyMWC97,
author = {Chris Buckley and Mandar Mitra and Janet A. Walz and Claire Cardie},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Using Clustering and SuperConcepts Within {SMART:} {TREC} 6},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {107--124},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/cornell.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/BuckleyMWC97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
The text categorization system TEKLIS at TREC-6¶
Thomas Brückner
- Participant: Siemens
- Paper: http://trec.nist.gov/pubs/trec6/papers/siemens.ps.gz
- Runs: teklis
Abstract
This short paper documents our participation on the filtering and routing tasks of TREC-6 with the commercial filtering system TEKLIS. TEKLIS is a training-based statistical categorization system which incorporates shallow linguistic processing and fuzzy-set methods. In the following we will present the core technology of TEKLIS, our results on the filtering and routing tasks and a discussion of the insights we gained through our participation.
Bibtex
@inproceedings{DBLP:conf/trec/Bruckner97,
author = {Thomas Br{\"{u}}ckner},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {The text categorization system {TEKLIS} at {TREC-6}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {619--621},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/siemens.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/Bruckner97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Mercure at TREC6¶
Mohand Boughanem, Chantal Soulé-Dupuy
- Participant: IRIT
- Paper: http://trec.nist.gov/pubs/trec6/papers/irit.ps.gz
- Runs: Mercure4
Abstract
We continue our work in trec performing runs in adhoc, routing and part of the cross language track. The major investigations this year are the weight schemes modification to take into account the document length. We also experiment the high precision procedure in automatic adhoc environment by tuning the term weight parameters.
Bibtex
@inproceedings{DBLP:conf/trec/BoughanemS97,
author = {Mohand Boughanem and Chantal Soul{\'{e}}{-}Dupuy},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Mercure at {TREC6}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {321--328},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/irit.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/BoughanemS97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Application of Logical Analysis of Data to the TREC-6 Routing Task¶
Endre Boros, Paul B. Kantor, Jung Jin Lee, Kwong Bor Ng, Di Zhao
- Participant: RutgersK
- Paper: http://trec.nist.gov/pubs/trec6/papers/rutLAD.ps.gz
- Runs: rutLADc1 | rutLADw1
Abstract
Our approach to TREC6 has explored the possibility of building complex Boolean expressions which represent the classificatory information present in the training data. The positive (i.e. judged relevant), and negative (i.e. judged not relevant) documents are studied separately, using Church's measure of 'non-Poissonicity' (Church & Gale, 1995) to identify promising terms for classification. In the official runs, statistics are produced using the MG (Witten, Moffat, Bell, 1994)) search engine, and the terms are in fact stems, rather than complete terms. The top 25 terms selected from the positive and negative examples are merged, to form a list with no more than 50 terms. The MG retrieval system is used (massively) to transform every judged document into a Boolean vector with one component for each distinct classification term. The RUTCOR LAD program (Boros, Hammer, Ibaraki, Kogan, Mayoraz, & Muchnik, 1996) is used (twice for each topic), with several modifications, to search exhaustively for Boolean prime implicants which characterize the positive and the negative examples. Due to computer speed limitations, we have limited the search in our official submissions to terms of order three (i.e terms such as ABC', where C' denotes the absence of term C). Each pattern which matches some positive (respectively, negative) examples is given a weight determined by the number of examples that it matches.
Bibtex
@inproceedings{DBLP:conf/trec/BorosKLNZ97,
author = {Endre Boros and Paul B. Kantor and Jung Jin Lee and Kwong Bor Ng and Di Zhao},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Application of Logical Analysis of Data to the {TREC-6} Routing Task},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {611--617},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/rutLAD.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/BorosKLNZ97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Using Information Extraction to Improve Document Retrieval¶
John Bear, David J. Israel, Jeff Petit, David L. Martin
- Participant: SRI
- Paper: http://trec.nist.gov/pubs/trec6/papers/sri.ps.gz
- Runs: srige1
Abstract
We describe an approach to applying a particular kind of Natural Language Processing (NLP) system to the TREC routing task in Information Retrieval (IR). Rather than attempting to use NLP techniques in indexing documents in a corpus, we adapted an information extraction (IE) system to act as a post-filter on the output of an IR system. The IE system was configured to score each of the top 2000 documents as determined by an IR system and on the basis of that score to rerank those 2000 documents. One aim was to improve precision on routing tasks. Another was to make it easier to write IE grammars for multiple topics.
Bibtex
@inproceedings{DBLP:conf/trec/BearIPM97,
author = {John Bear and David J. Israel and Jeff Petit and David L. Martin},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Using Information Extraction to Improve Document Retrieval},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {367--377},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/sri.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/BearIPM97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Daimler Benz Research: System and Experiments Routing and Filtering¶
Thomas Bayer, Heike Mogg-Schneider, Ingrid Renz, Hartmut Schäfer
- Participant: DBenz
- Paper: http://trec.nist.gov/pubs/trec6/papers/dbulm.ps.gz
- Runs: dbulm1
Abstract
The retrieval approach is based on vector representation (bag of character strings), on dimension reduction (LSI - latent semantic indexing) and on statistical machine learning techniques in all processing levels. Two phases are distinguished, the adaptation phase based on training samples (texts) and the application phase, where each text is mapped to one or more categories (classes). The adaptation process is corpus dependent and automatic and, hence, domain and language independent. The main idea of this approach is to generate different sets of simple features which represent different views to texts. For each text to be filtered/ routed, different feature vectors are generated and classified into a decision vector which contains estimates of class membership probabilities. In the following step, these decision vectors are regarded as feature vectors and fed to another classifier that combines these set of decision vectors into the final one.
Bibtex
@inproceedings{DBLP:conf/trec/BayerMRS97,
author = {Thomas Bayer and Heike Mogg{-}Schneider and Ingrid Renz and Hartmut Sch{\"{a}}fer},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {Daimler Benz Research: System and Experiments Routing and Filtering},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {329--346},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/dbulm.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/BayerMRS97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
INQUERY Does Battle With TREC-6¶
James Allan, James P. Callan, W. Bruce Croft, Lisa Ballesteros, Donald Byrd, Russell C. Swan, Jinxi Xu
- Participant: UMass
- Paper: http://trec.nist.gov/pubs/trec6/papers/umass-trec6.ps.gz
- Runs: INQ403 | INQ404
Abstract
This year the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts participated in eight of the ten tracks that were part of the TREC-6 workshop. We started with the two required tracks, ad-hoc and routing, but then included VLC, Filtering, Chinese, Cross-language, SDR, and Interactive. We omitted NLP and High Precision for want of time and energy. With so many tracks involved, it is nearly inevitable that something will go wrong. Despite our best efforts at verifying all aspects of each track-before, during, and after the experiments we once again made mistakes that were minor in scope, but major in consequence. Those mistakes affected our results in Ad-hoc and Routing, as well as the dependent tracks of VLC and Filtering. The details of the mistakes are presented in each track's discussion, along with information comparing the submitted runs to the corrected runs. Unfortunately, those corrected runs are not included in TREC-6 summary information. This remainder of this report covers our approach to each of the tracks as well as some experimental results and analysis. We start with an overview of the major tools that were used across all tracks. The paper is divided into the following sections. The track descriptions are generally broken into approach, results, and analysis sections, though some tracks require a different description.
Bibtex
@inproceedings{DBLP:conf/trec/AllanCCBBSX97,
author = {James Allan and James P. Callan and W. Bruce Croft and Lisa Ballesteros and Donald Byrd and Russell C. Swan and Jinxi Xu},
editor = {Ellen M. Voorhees and Donna K. Harman},
title = {{INQUERY} Does Battle With {TREC-6}},
booktitle = {Proceedings of The Sixth Text REtrieval Conference, {TREC} 1997, Gaithersburg, Maryland, USA, November 19-21, 1997},
series = {{NIST} Special Publication},
volume = {500-240},
pages = {169--206},
publisher = {National Institute of Standards and Technology {(NIST)}},
year = {1997},
url = {http://trec.nist.gov/pubs/trec6/papers/umass-trec6.ps.gz},
timestamp = {Wed, 07 Jul 2021 01:00:00 +0200},
biburl = {https://dblp.org/rec/conf/trec/AllanCCBBSX97.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}