Graduate Thesis Or Dissertation

 

Similarity Analysis on Unstructured Text Using Dependency Trees in Biomedical Domain Public Deposited

https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/cv43nx17z
Abstract
  • The published Biomedical scientific literature discusses most of the relationships between biomedical entities like drugs, genes, diseases and cellular processes. Relationships in the form of X (drug) inhibits Y (Gene), X (drug) treats Y (disease) and so forth are scattered in an unstructured format over millions of articles. Sentences like “X decreases Y”, “Y is decreased by X” and “X reduces Y’s effect” represents the same underlying relationship (decrease) between X and Y despite different sentence structures. Identifying such similarities in the relationships is critical to various applications in natural language processing and information retrieval.Extracting these similar relationships between entities has various applications in question and answering [1], relationship analysis [2], and semantic search [3]. However, identifying these relationships from the vast corpus of unstructured data is a complex task which involves techniques like data mining, machine learning, and Natural language processing. We found that various methods like EBC [2] have inherent drawbacks in scaling to larger datasets and also in using full-text bodies for analysis. Inspired by this need, this thesis work focuses on scalable similarity analysis on the unstructured text of full-text bodies using entities from different ontologies.We devised a new method - Mengsim, which is a dependency parse based similarity detection technique that finds similar relationships between semantic concepts from sentences like “X decreases Y”, “Y is decreased by X” and “X reduces Y’s effect”. Mengsim relies on dependency grammar which gives syntactic connections between words in a sentence [4].Mengsim’s evaluation along with standard models showed its effectiveness in retrieving similar relationships. We also found that the proposed method can scale to larger datasets. We used concepts from three biomedical ontologies in our methods - diseases, drugs and genes which show the ability to scale to multiple ontologies.
Creator
Date Issued
  • 2017
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Last Modified
  • 2019-11-18
Resource Type
Rights Statement
Language

Relationships

Items