Skip to content

Proceedings - Medical 2011

Search for Clinical Records: RMIT at Medical TREC

Iman Amini, Mark Sanderson, David Martínez, Xiaodong Li

Abstract

We combine several techniques to participate in Medical TREC 2011 and later we decompose our combined methodologies to gain a thorough understanding of the effects of each individual technique. In this paper we focus on Information Extraction and Expansion to find the best setting for an ideal IR system. Results suggest that Information Expansion is a key strategy in finding relevant reports for a medical query.

Bibtex
@inproceedings{DBLP:conf/trec/AminiSML11,
    author = {Iman Amini and Mark Sanderson and David Mart{\'{\i}}nez and Xiaodong Li},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Search for Clinical Records: {RMIT} at Medical {TREC}},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/RMIT.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/AminiSML11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Identifying Patients for Clinical Studies from Electronic Health Records: TREC Medical Records Track at OHSU

Steven Bedrick, Kyle H. Ambert, Aaron M. Cohen, William R. Hersh

Abstract

The task of the TREC 2011 Medical Records Track consisted of searching electronic health record (EHR) documents in order to identify patients matching a set of clinical criteria, a use case that might be part of the preparation of a quality report or to develop a cohort for a clinical trial. The task's various topics each represented a different case definition, with the topics varying widely in terms of detail and linguistic complexity. This use case is one of a larger group that represent the “secondary use” of data in EHRs [1] that facilitate clinical research, quality improvement, and other aspects of a health care system that can “learn” from its data and outcomes [2]. It is made possible by the large US government investment in EHR adoption that has occurred since 2009 [3]. [...]

Bibtex
@inproceedings{DBLP:conf/trec/BedrickACH11,
    author = {Steven Bedrick and Kyle H. Ambert and Aaron M. Cohen and William R. Hersh},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Identifying Patients for Clinical Studies from Electronic Health Records: {TREC} Medical Records Track at {OHSU}},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/OHSU.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/BedrickACH11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

UCD IIRG at TREC 2011 Medical Track

James Cogley, Nicola Stokes, John Dunnion, Joe Carthy

Abstract

In this paper, we present several approaches to the retrieval of medical visits in response to user queries on patient demographics. A visit is comprised of one or more medical reports. Given a data collection of medical reports, TREC Medical Track participants had the opportunity to either preprocess the documents concatenating reports into visits, or to post-process by retrieving reports and developing a method to create a ranking of visits given the retrieved reports. This paper outlines attempts at both approaches in order to determine the influence of the disparity of document lengths in the collection. For both these approaches query expansion and concept re-ranking are applied. Concept reranking identifies the number of unique concepts from an expanded query contained in a document, and boosts the rank of documents which contain more unique concepts.

Bibtex
@inproceedings{DBLP:conf/trec/CogleySDC11,
    author = {James Cogley and Nicola Stokes and John Dunnion and Joe Carthy},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{UCD} {IIRG} at {TREC} 2011 Medical Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/UCDSI.med.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CogleySDC11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

MetaMap is a Superior Baseline to a Standard Document Retrieval Engine for the Task of Finding Patient Cohorts in Clinical Free Text

K. Bretonnel Cohen, Tom Christiansen, Lawrence E. Hunter

Abstract

The goal of this work was to establish a reasonable baseline for research in patient cohort retrieval from clinical free text. Much recent work has used Lucene for this purpose. Our approach was to use MetaMap alone. We found that although many TREC 2011 Electronic Medical Records track participants found it difficult to beat a Lucene baseline, our MetaMap-based baseline did outperform a number of Lucene runs. We propose that MetaMap is a more valid baseline than Lucene, providing essential concept extraction, and that failure to make use of this industry-standard tool results in an unfairly low baseline for evaluation of system outputs.

Bibtex
@inproceedings{DBLP:conf/trec/CohenCH11,
    author = {K. Bretonnel Cohen and Tom Christiansen and Lawrence E. Hunter},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {MetaMap is a Superior Baseline to a Standard Document Retrieval Engine for the Task of Finding Patient Cohorts in Clinical Free Text},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/UCSOM\_BTMG.medical.new.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/CohenCH11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Medical-Miner at TREC 2011 Medical Records Track

Juan Manuel Córdoba, Manuel J. Maña López, Noa P. Cruz Díaz, Jacinto Mata, Fernando Aparicio, Manuel de Buenaga Rodríguez, Daniel Glez-Peña, Florentino Fdez-Riverola

Abstract

This paper presents work done at Medical Minner Project on the TREC-2011 Medical Records Track. The paper proposes four models for medical information retrieval based on Lucene index approach. Our retrieval engine used an Lucen Index scheme with traditional stopping and stemming, enhanced with entity recognition software on query terms. Our aim in this first competition is to set a broader project that involves the develop of a configurable Apache Lucene-based framework that allows the rapid development of medical search facilities. Results around the track median have been achieved. In this exploratory track, we think that these results are a good beginning and encourage us for future developments.

Bibtex
@inproceedings{DBLP:conf/trec/CordobaLDVARGF11,
    author = {Juan Manuel C{\'{o}}rdoba and Manuel J. Ma{\~{n}}a L{\'{o}}pez and Noa P. Cruz D{\'{\i}}az and Jacinto Mata and Fernando Aparicio and Manuel de Buenaga Rodr{\'{\i}}guez and Daniel Glez{-}Pe{\~{n}}a and Florentino Fdez{-}Riverola},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Medical-Miner at {TREC} 2011 Medical Records Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/Medicalminer.medical.update.pdf},
    timestamp = {Fri, 24 Jul 2020 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/CordobaLDVARGF11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

York University at TREC 2011: Medical Records Track

Mariam Daoud, Dawid Kasperowicz, Jun Miao, Jimmy X. Huang

Abstract

n this paper, we present our participation in the Medical Records Track of TREC 2011. The goal of this track is to develop quick search and retrieval tools that are useful for physicians for the purpose to find patients that have similar diseases and/or treatments. To achieve this goal, we propose query expansion and semantic matching models using semantic medical ontologies for medical data retrieval. The query expansion utilizes a medical disease dictionary that presents different possible reformulations given the query disease keywords. For the semantic matching model, we employed BioLabler, a medical annotation tool that allows indexing of queries and documents with UMLS concepts of our choosing. Moreover, the matching model consists of ranking the documents that contain the query concepts according to their scores in the document. We also evaluate a traditional weighting model (BM25), query expansion using relevance feedback under Rocchio's feedback framework and the impact of genre and age filtering, proximity and co-occurrence between disease keywords and procedure/intervention keywords on the retrieval performance.

Bibtex
@inproceedings{DBLP:conf/trec/DaoudKMH11,
    author = {Mariam Daoud and Dawid Kasperowicz and Jun Miao and Jimmy X. Huang},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {York University at {TREC} 2011: Medical Records Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/york.medical.pdf},
    timestamp = {Sun, 02 Oct 2022 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/DaoudKMH11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

A Knowledge-Based Approach to Medical Records Retrieval

Dina Demner-Fushman, Swapna Abhyankar, Antonio Jimeno-Yepes, Russell F. Loane, Bastien Rance, François-Michel Lang, Nicholas C. Ide, Emilia Apostolova, Alan R. Aronson

Abstract

The NLM LHC team approached the cohort selection task of the 2011 Medical Records Track as a question answering problem. We developed 60 training topics and then manually converted those topics to question frames. We started with the evidence-based medicine well-formed question frame and expanded it to explicitly capture temporal and causal relations. We then implemented a syntactic-semantic method for extracting the question frames from the free text topics. Based on the clinical documentation standards and knowledge of the clinical documentation structure, we split each report type into sections corresponding to different categories of clinical content, with the result that each section contained a specific class of data. We then ranked each document section according to its likelihood of containing answers to specific question frame slots. For example, if a question concerns medications prior to admission, the answers should be found in the Medications on Admission and the Medical History sections. In addition, we split each section into Positive (containing asserted findings, problems, and interventions), Negative (in which findings are negated) and Possible (that includes all uncertain statements). After structuring the questions and the documents, we developed algorithms for expressing question frames in the languages of the two search engines used for retrieval: Essie and Lucene. In addition to the UMLS synonymy-based query expansion built into Essie and implemented externally for Lucene, we expanded the terms in the documents with their ancestors and children from the MeSH hierarchy. We also expanded query terms for recognized drug names using RxNorm and Google searches. In addition to the automatically generated baseline and expanded queries that we ran against the original and the structured documents, we used the Essie user interface for manual query generation. During this process, we determined that a third of the automatically generated question frames, although technically correct, needed significant modifications due to different sub-languages used in the documents and in the queries. The manually created queries were used to search the collection with each search engine. Our manual queries submitted to Essie significantly outperformed all of our other runs (achieving 0.73 P@10, 0.66 bpref, and 0.49 R-prec). Interestingly, the best automatic run for Lucene was the baseline run (P@10 = 0.44, bpref = 0.47, R-prec = 0.33) that used the topics “as is” to search the original documents. The results for this run are not significantly different from the manual Lucene (P@10 = 0.51, bpref = 0.51, R-prec = 0.35) and the automatic Essie (P@10 = 0.49, bpref = 0.48, R-prec = 0.33) runs.

Bibtex
@inproceedings{DBLP:conf/trec/Demner-FushmanAJLRLIAA11,
    author = {Dina Demner{-}Fushman and Swapna Abhyankar and Antonio Jimeno{-}Yepes and Russell F. Loane and Bastien Rance and Fran{\c{c}}ois{-}Michel Lang and Nicholas C. Ide and Emilia Apostolova and Alan R. Aronson},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {A Knowledge-Based Approach to Medical Records Retrieval},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/NLM.medical.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Demner-FushmanAJLRLIAA11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

IRIT at TREC 2011: Evaluation of Query Expansion Techniques for Medical Record Retrieval

Duy Dinh, Lynda Tamine

Abstract

In TREC 2011, we are motivated to participate in the medical record retrieval task, namely TRECMed. Our research focused on the evaluation of term weighting models and query expansion techniques within the medical record retrieval task. We compared the performance of different state-of-the-art term weighting models for retrieving patient records that might best suit the clinical information need. Afterwards, we evaluate different state-of-the-art query expansion (QE) techniques within an IR framework. We describe the IR system architecture and how we carried out the TREC experiments, and we present effectiveness results.

Bibtex
@inproceedings{DBLP:conf/trec/DinhT11,
    author = {Duy Dinh and Lynda Tamine},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{IRIT} at {TREC} 2011: Evaluation of Query Expansion Techniques for Medical Record Retrieval},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/IRIT.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/DinhT11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Concept-Centric Indexing and Retrieval on Medical Text

David Eichmann

Abstract

The NIH Clinical and Translational Science Award (CTSA) program has resulted in the formation of new research interactions for many IR and NLP research groups. Research access to large-­-scale clinical data is proving to be a critical component of the overall goals of the CTSA. While much of the clinical record is tabular and structured, substantial amounts of pertinent information reside in unstructured text attached to those structured records. This is particularly true for research subject cohort identification, where the inclusion and exclusion criteria for a given study (e.g., family history, quality of life assessments, etc.) may not well align with the data captured in a typical clinical encounter. The TREC Medical Record track provides an excellent means to drive innovation in clinical data retrieval, particularly for unstructured elements of the electronic medical record.

Bibtex
@inproceedings{DBLP:conf/trec/Eichmann11,
    author = {David Eichmann},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Concept-Centric Indexing and Retrieval on Medical Text},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/UI\_ICTS.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Eichmann11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

BiTeM Group Report for TREC Medical Records Track 2011

Julien Gobeill, Arnaud Gaudinat, Patrick Ruch, Emilie Pasche, Douglas Teodoro, Dina Vishnyakova

Abstract

The BiTeM group participated in the first TREC Medical Records Track in 2011 relying on a strong background in medical records processing and medical terminologies. For this campaign, we submitted a baseline run, computed with a simple free-text index in the Terrier platform, which achieved fair results (0.468 for P10). We also performed automatic text categorization on medical records and built additional inter-lingua representations in MeSH and SNOMED-CT concepts. Combined with the text index, these terminological representations led to a slight improvement of the top precision (+5 % for Mean Reciprocal Rank). But the most interesting is analysing the contribution of each representation in the coverage of the correct answer. The text representation and the additional terminological representations bring different, and finally complementary, views of the problem: if 40% of the official relevant visits were retrieved by our text index, an additional 15% part was retrieved only with the terminological representations, leading to 55% (more than half) of the relevant visits retrieved by all representations. Finally, an innovative re-ranking strategy was designed capitalizing on MeSH disorders concepts mapped on queries and their UMLS-equivalent ICD9 codes: visits that shared this ICD9 discharge code were boosted. This strategy led to another 10% improvement for top precision. Unfortunately, any deeper conclusion based on the official results is impossible to draw due the massive use of Lucene and the evaluation methods (pool): in our baseline text run, only 52% of our top 50 retrieved documents were judged, against 77% for another participant's baseline text run who used Lucene. Official metrics focused on precision are thus difficult to interpret.

Bibtex
@inproceedings{DBLP:conf/trec/GobeillGRPTV11a,
    author = {Julien Gobeill and Arnaud Gaudinat and Patrick Ruch and Emilie Pasche and Douglas Teodoro and Dina Vishnyakova},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {BiTeM Group Report for {TREC} Medical Records Track 2011},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/BiTeM.medical.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GobeillGRPTV11a.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Cohort Shepherd: Discoving Cohort Traits from Hospital Visits

Travis R. Goodwin, Bryan Rink, Kirk Roberts, Sanda M. Harabagiu

Abstract

This paper describes the system created by the University of Texas at Dallas for content-based medical record retrieval submitted to the TREC 2011 Medical Records Track. Our system builds a query by extracting keywords from a given topic using a Wikipedia-based approach we use regular expressions to extract age, gender, and negation requirements. Each query is then expanded by relying on UMLS, SNOMED, Wikipedia, and PubMed Co-occurrence data for retrieval. Four runs were submitted: two based on Lucene with varying scoring methods, and two based on a hybrid approach with varying negation detection techniques. Our highest scoring submission achieved a MAP score of 40.8.

Bibtex
@inproceedings{DBLP:conf/trec/GoodwinRRH11,
    author = {Travis R. Goodwin and Bryan Rink and Kirk Roberts and Sanda M. Harabagiu},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Cohort Shepherd: Discoving Cohort Traits from Hospital Visits},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/UTD\_HLT.medical.update.pdf},
    timestamp = {Mon, 19 Apr 2021 01:00:00 +0200},
    biburl = {https://dblp.org/rec/conf/trec/GoodwinRRH11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

A Semantic Platform for Information Retrieval from E-Health Records

Harsha Gurulingappa, Bernd Müller, Martin Hofmann-Apitius, Juliane Fluck

Abstract

Electronic patient health records encompass valuable information about patient's medical problems, diagnoses, and treatments offered including their outcomes. However, a problem for medical professionals is an ability to efficiently access the information that are documented in the form of free-text. Therefore, the work presented here exhibits an information retrieval platform for efficient processing of e-health records. The system offers facilities for keyword searches, semantic searches, and ontological searches. An open evaluation during the TRECMED 2011 demonstrated competitive results.

Bibtex
@inproceedings{DBLP:conf/trec/GurulingappaMHFGH11a,
    author = {Harsha Gurulingappa and Bernd M{\"{u}}ller and Martin Hofmann{-}Apitius and Juliane Fluck},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {A Semantic Platform for Information Retrieval from E-Health Records},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/Fraunhofer-SCAI.med.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/GurulingappaMHFGH11a.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Search for Medical Records: NICTA at TREC 2011 Medical Track

Sarvnaz Karimi, David Martínez, Sumukh Ghodke, Lawrence Cavedon, Hanna Suominen, Lumin Zhang

Abstract

NICTA (National ICT Australia) participated in the Medical Records track of TREC 2011 with seven automatic runs. The main techniques used in our submissions involved using Boolean retrieval for filtering, query transformation, and query expansion. Evaluation of our best run ranks our submissions higher than the median of all systems for this track, and stands at rank seven among 109 automatic runs which were submitted by the 29 participating groups.

Bibtex
@inproceedings{DBLP:conf/trec/KarimiMGCSZ11,
    author = {Sarvnaz Karimi and David Mart{\'{\i}}nez and Sumukh Ghodke and Lawrence Cavedon and Hanna Suominen and Lumin Zhang},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Search for Medical Records: {NICTA} at {TREC} 2011 Medical Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/NICTA\_BIOTALA.med.update2.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KarimiMGCSZ11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Cengage Learning at TREC 2011 Medical Track

Benjamin King, Lijun Wang, Ivan Provalov, Jerry Zhou

Abstract

This paper details Cengage Learning's submissions for this year's TREC medical track. The techniques we used fall roughly into two categories: information extraction and query expansion. From both the queries and the medical reports, we extracted limiting attributes, such as age, race, and gender, and labeled terms appearing in the Unified Medical Language System (UMLS). We also used three different techniques of query expansion: UMLS related terms, terms from a network built from UMLS, and terms from our medical reference encyclopedias. We submitted four different runs varying only in their methods of query expansion.

Bibtex
@inproceedings{DBLP:conf/trec/KingWPZ11,
    author = {Benjamin King and Lijun Wang and Ivan Provalov and Jerry Zhou},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Cengage Learning at {TREC} 2011 Medical Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/Cengage.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KingWPZ11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

AEHRC & QUT at TREC 2011 Medical Track: A Concept-Based Information Retrieval Approach

Bevan Koopman, Michael Lawley, Peter Bruza, Laurianne Sitbon

Abstract

The Australian e-Health Research Centre and Queensland University of Technology recently participated in the TREC 2011 Medical Records Track. This paper reports on our methods, results and experience using a concept-based information retrieval approach. Our concept-based approach is intended to overcome specific challenges we identify in searching medical records. Queries and documents are transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results show our concept-based approach performed above the median in all three performance metrics: bref (+12%), R-prec (+18%) and Prec@10 (+6%).

Bibtex
@inproceedings{DBLP:conf/trec/KoopmanLBS11,
    author = {Bevan Koopman and Michael Lawley and Peter Bruza and Laurianne Sitbon},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{AEHRC} {\&} {QUT} at {TREC} 2011 Medical Track: {A} Concept-Based Information Retrieval Approach},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/AEHRC.medical.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/KoopmanLBS11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

University of Glasgow at Medical Records Track: Experiments with Terrier

Nut Limsopatham, Craig Macdonald, Iadh Ounis, Graham McDonald, Matt-Mouley Bouamrane

Abstract

In our participation in the TREC 2011 Medical Records track, we investigate (1) novel voting-based approaches for identifying relevant patient visits from an aggregate of relevant medical records, (2) the effective handling of negated language in records and queries, and (3) the adoption of medical-domain ontologies for improving the representation of queries, all within the context of our Terrier information retrieval platform.

Bibtex
@inproceedings{DBLP:conf/trec/LimsopathamMOMB11,
    author = {Nut Limsopatham and Craig Macdonald and Iadh Ounis and Graham McDonald and Matt{-}Mouley Bouamrane},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {University of Glasgow at Medical Records Track: Experiments with Terrier},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/uogTr.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/LimsopathamMOMB11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

DEMIR at TREC Medical 2011: Power of Term Phrases in Medical Text Retrieval

Okan Ozturkmenoglu, Adil Alpkocak

Abstract

This paper present the details of participation of DEMIR (Dokuz Eylul University Multimedia Information Retrieval) research team to TREC 2011 Medical Records track. In this study, our aim is to index and retrieve medical terms and term phrases in medical text archives. We searched medical terms and term phrases with using UMLS which is a metathesaurus about medical. We evaluated the effects of terms and term phrases on retrieval system in TREC 2011 Medical Records track, considering terms and term phrases as medical entities. We improved results by examination of different weighting schemes for retrieved data.

Bibtex
@inproceedings{DBLP:conf/trec/OzturkmenogluA11,
    author = {Okan Ozturkmenoglu and Adil Alpkocak},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{DEMIR} at {TREC} Medical 2011: Power of Term Phrases in Medical Text Retrieval},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/DEUCENG.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/OzturkmenogluA11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

DutchHatTrick: Semantic Query Modeling, ConText, Section Detection, and Match Score Maximization

Martijn J. Schuemie, Dolf Trieschnigg, Edgar Meij

Abstract

This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports. [...]

Bibtex
@inproceedings{DBLP:conf/trec/SchuemieTM11,
    author = {Martijn J. Schuemie and Dolf Trieschnigg and Edgar Meij},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {DutchHatTrick: Semantic Query Modeling, ConText, Section Detection, and Match Score Maximization},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/DutchHatTrick.med.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/SchuemieTM11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Finding Patient Visits in EMR Using LUXID®

Luca Toldo, Alexander Scheer

Abstract

INTRODUCTION : Free text sections of the Electronic Medical Records (EMR) contain information that cannot be appropriately constrained in the structured forms. Several studies have shown the potential utility in mining EMR free texts for identifying adverse events (e.g. EU-PSIP, EU-ALERT), and large public-private research projects (e.g. IMI-EHR4CR, CLOUD4HEALTH) aim at mining them further, e.g. for clinical trial optimisation and pharmacovigilance purposes. AIM : The purpose of this work has been to assess the performance of LUXID R©, an off-the-shelve commercial natural language processing system, using the dictionary- and rule-based Medical Entity Relationships Skill Cartridge R©and KNIME as automation workflow engine for result combination and formatting, on the University of Pittsburgh BLULab NLP Repository benchmark, in the context of the TREC 2011 Medical Records Retrieval Track (TREC-MED2011). RESULTS : The system here described achieved the best score for one of the 34 queries (defined as query 111) and overall classified as top 7th-8th (according to the scoring used) in the manual track of TREC-MED2011. More than 80% of the queries of TREC-MED2011 could be appropriately processed automatically. Performance of manually interpreted queries did not differ substantially from those automatically processed. More than 60% of the queries submitted by our system delivered a performance above or on the median of all participants. Very high precision of the system, delivering in certain cases a very low number of hits, correlated statistically with the overall performance. CONCLUSIONS : Initial results, error analysis are reported and strategies for improvements of the system are outlined; fully supporting the appropriateness in using this technology for identifying patients matching inclusion/exclusion criteria using plain text from unstructured EMR.

Bibtex
@inproceedings{DBLP:conf/trec/ToldoS11,
    author = {Luca Toldo and Alexander Scheer},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Finding Patient Visits in {EMR} Using LUXID{\textregistered}},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/merckkgaa.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ToldoS11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

An Exploration of New Ranking Strategies for Medical Record Tracks

Hao Wu, Hui Fang

Abstract

We report our system and experiments at the 2011 Medical Record Track. Our goal is to return most relevant visits according to a query. In particular, we start with an axiomatic retrieval, and combine it with an aspect based term proximity strategy to improve the retrieval performance. We also propose a “disease diversity” strategy based on the assumption that most of documents only contain information related to one main disease. Query expansion using external resources has also been studied.

Bibtex
@inproceedings{DBLP:conf/trec/WuF11,
    author = {Hao Wu and Hui Fang},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {An Exploration of New Ranking Strategies for Medical Record Tracks},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/Udel\_Fang.medical.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/WuF11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

PRIS at TREC 2011 Medical Record Track

Jiayue Zhang, Xueneng Lin, Yang Zou, Shuai Zhu, Jing Xiao, Weiran Xu, Guang Chen, Jun Guo

Abstract

Our method to accomplish the Medical Record Track is described in this paper. For ad hoc retrieval, Indri and Xapian are used for indexing, searching, and initial query expansion. The main query expansion is achieved using LSI. The evaluation results show the performance of our system is above the average.

Bibtex
@inproceedings{DBLP:conf/trec/ZhangLZZXXCG11,
    author = {Jiayue Zhang and Xueneng Lin and Yang Zou and Shuai Zhu and Jing Xiao and Weiran Xu and Guang Chen and Jun Guo},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {{PRIS} at {TREC} 2011 Medical Record Track},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/PRIS.medical.pdf},
    timestamp = {Tue, 17 Nov 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ZhangLZZXXCG11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Using Multiple External Collections for Query Expansion

Dongqing Zhu, Ben Carterette

Abstract

For the 2011 Medical Records Track, we used several external collections for query expansion and mainly explored three research questions: First, we investigated the possibility of using query sessions from PubMed query logs for improving the estimation of a relevance model. In a typical search scenario, a user may submit multiple queries before she actually finds satisfactory search results. These closely related queries form a single query session which represents a single information need. By finding relevant query sessions with regard to a Medical Track topic we can incorporate into our relevance model useful query terms which reflect real information needs that are more or less related to the Medical Track topic. Second, we explored how the size and quality of external collections would impact the effectiveness of query expansion. More specifically, we used TREC 2007 Genomics Track data and ImageCLEF 2009 Medical Retrieval data. The former collection is more genomics-related and is larger while the latter one is more medical-related and is much smaller. Intuitively, it is more likely for a larger external collection to contain more good expansion terms. However, the quality (in terms of the overlapping concepts between the target collection and an external collection) can be an important factor as well. This allowed us to carry out a pilot study on the relationship between collection quality, size, and the effect on query expansion. Third, we used a mixture of external collections for query expansion. In particular, we explored methods that can adaptively combine evidence from multiple collections for different topics. Usually, the weights for a mixture relevance model are determined via training on a test collection, and thus are fixed across all topics. If we could estimate the concept overlapping of a topic with external collections and assign weights for the mixture model accordingly, the system can be adaptive to topics and may achieve a better performance. That is the motivation for this third research direction. We first describe our retrieval models and systems in Sections 2 and 3. Then in Section 4 we show and compare the official TREC evaluation results of our submissions, and further analyze our retrieval system performance based on the test collection. Following that, we discuss the above research questions in Section 5. We conclude in Section 6.

Bibtex
@inproceedings{DBLP:conf/trec/ZhuC11,
    author = {Dongqing Zhu and Ben Carterette},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Using Multiple External Collections for Query Expansion},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/udel.medical.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/ZhuC11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

Empirical Ontologies for Cohort Identification

Stephen T. Wu, Kavishwar B. Wagholikar, Sunghwan Sohn, Vinod Kaggal, Hongfang Liu

Abstract

The growth of patient data stored in Electronic Medical Records (EMR) has greatly expanded the potential for the evidence-based improvement of clinical practice. The proper re-use of this clinical information, however, does not replace basic research techniques — it augments them. The Text REtrieval Conference 2011 Medical Records Track explored how information retrieval may support clinical research by providing an efficient means to identify cohorts for clinical studies. Mayo Clinic NLP's submission to the TREC Medical Records track attempts information retrieval at a semantic level, combining two disparate means of computing clinical semantics. Substantial effort has gone into the development of precise semantic specification of concepts in medical ontologies and terminologies[1, ?]. But human clinicians do not generate clinical text by referring to such resources, and ontology creators do not base their terminology design on clinical text — so the distribution of ontology concepts in actual clinical texts may differ greatly. Therefore, in representing clinical reports for cohort identification, we advocate for a model that makes use of expert knowledge, is empirically validated, and considers context. This is accomplished through a new framework: empirical ontologies. Patient cohort identification is thus a practical use case for the techniques in our recent work on clinical concept frequency comparisons[2, 3]. The rest of this paper describes the TREC 2011 Medical Records task, describes Mayo Clinic's run submissions, and reports evaluation results with subsequent discussion.

Bibtex
@inproceedings{DBLP:conf/trec/WuWSKL11,
    author = {Stephen T. Wu and Kavishwar B. Wagholikar and Sunghwan Sohn and Vinod Kaggal and Hongfang Liu},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {Empirical Ontologies for Cohort Identification},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/Mayo.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/WuWSKL11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}

The University of Iowa at TREC 2011: Microblogs, Medical Records and Crowdsourcing

Sanmitra Bhattacharya, Christopher G. Harris, Yelena Mejova, Chao Yang, Padmini Srinivasan

Abstract

The Text Retrieval and Text Mining group at the University of Iowa participated in three tracks, all new tracks introduced this year: Microblog, Medical Records (Med) and Crowdsourcing. Details of our strategies are provided in this paper. Overall our effort has been fruitful in that we have been able to understand more about the nature of medical records and Twitter messages, and also the merits and challenges of working with games as a framework for gathering relevance judgments.

Bibtex
@inproceedings{DBLP:conf/trec/Bhattacharya0MYS11,
    author = {Sanmitra Bhattacharya and Christopher G. Harris and Yelena Mejova and Chao Yang and Padmini Srinivasan},
    editor = {Ellen M. Voorhees and Lori P. Buckland},
    title = {The University of Iowa at {TREC} 2011: Microblogs, Medical Records and Crowdsourcing},
    booktitle = {Proceedings of The Twentieth Text REtrieval Conference, {TREC} 2011, Gaithersburg, Maryland, USA, November 15-18, 2011},
    series = {{NIST} Special Publication},
    volume = {500-296},
    publisher = {National Institute of Standards and Technology {(NIST)}},
    year = {2011},
    url = {http://trec.nist.gov/pubs/trec20/papers/UIowaS.microblog.medical.crowdsourcing.update.pdf},
    timestamp = {Thu, 12 Mar 2020 00:00:00 +0100},
    biburl = {https://dblp.org/rec/conf/trec/Bhattacharya0MYS11.bib},
    bibsource = {dblp computer science bibliography, https://dblp.org}
}