Date of Award
Master of Science (MS)
In this paper, a variety of lexical expansion approaches were evaluated using the Medpedia corpus and MiPACQ queries in order to improve the MiPACQ system's retrieval performance. The heart of the MiPACQ system is a document reranking component, and this component utilizes the results from a baseline information retrieval system. However, the baseline IR system used in MiPACQ has poor paragraph level recall performance which limits the reranker's overall performance. To help solve these issues, three broad term expansion approaches are outlined in this paper with the purpose of increasing recall over the baseline Lucene retrieval system without introducing a significant amount of noise. Two of the three expansion approaches only rely on the corpus being indexed, while the last expansion technique requires a domain specific ontology to expand query terms. First, automatic thesaurus generation based on co-occurrences is evaluated as an expansion methodology along side other co-occurrence based expansion methods. Next, a resource based approach that uses the UMLS Metathesaurus for expansion is used to evaluate knowledge rich expansion methods. Finally, latent semantic indexing is evaluated as an alternative to the baseline vector space retrieval model. These methods are compared and tweaked and the best method is recommended to the MiPACQ authors to improve Q & A results.
St. Charles, Timothy Wilson, "A Comparison of Lexical Expansion Methodologies to Improve Medical Question and Answering Systems" (2012). Computer Science Graduate Theses & Dissertations. 44.