Date of Award

Spring 1-1-2012

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

James Martin

Second Advisor

Wayne Ward

Third Advisor

Rodney Nielsen

Abstract

In this paper, a variety of lexical expansion approaches were evaluated using the Medpedia corpus and MiPACQ queries in order to improve the MiPACQ system's retrieval performance. The heart of the MiPACQ system is a document reranking component, and this component utilizes the results from a baseline information retrieval system. However, the baseline IR system used in MiPACQ has poor paragraph level recall performance which limits the reranker's overall performance. To help solve these issues, three broad term expansion approaches are outlined in this paper with the purpose of increasing recall over the baseline Lucene retrieval system without introducing a significant amount of noise. Two of the three expansion approaches only rely on the corpus being indexed, while the last expansion technique requires a domain specific ontology to expand query terms. First, automatic thesaurus generation based on co-occurrences is evaluated as an expansion methodology along side other co-occurrence based expansion methods. Next, a resource based approach that uses the UMLS Metathesaurus for expansion is used to evaluate knowledge rich expansion methods. Finally, latent semantic indexing is evaluated as an alternative to the baseline vector space retrieval model. These methods are compared and tweaked and the best method is recommended to the MiPACQ authors to improve Q & A results.

Share

COinS