Graduate Thesis Or Dissertation
Similarity Analysis on Unstructured Text Using Dependency Trees in Biomedical Domain 公开 Deposited
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/cv43nx17z
- Abstract
- The published Biomedical scientific literature discusses most of the relationships between biomedical entities like drugs, genes, diseases and cellular processes. Relationships in the form of X (drug) inhibits Y (Gene), X (drug) treats Y (disease) and so forth are scattered in an unstructured format over millions of articles. Sentences like “X decreases Y”, “Y is decreased by X” and “X reduces Y’s effect” represents the same underlying relationship (decrease) between X and Y despite different sentence structures. Identifying such similarities in the relationships is critical to various applications in natural language processing and information retrieval.Extracting these similar relationships between entities has various applications in question and answering [1], relationship analysis [2], and semantic search [3]. However, identifying these relationships from the vast corpus of unstructured data is a complex task which involves techniques like data mining, machine learning, and Natural language processing. We found that various methods like EBC [2] have inherent drawbacks in scaling to larger datasets and also in using full-text bodies for analysis. Inspired by this need, this thesis work focuses on scalable similarity analysis on the unstructured text of full-text bodies using entities from different ontologies.We devised a new method - Mengsim, which is a dependency parse based similarity detection technique that finds similar relationships between semantic concepts from sentences like “X decreases Y”, “Y is decreased by X” and “X reduces Y’s effect”. Mengsim relies on dependency grammar which gives syntactic connections between words in a sentence [4].Mengsim’s evaluation along with standard models showed its effectiveness in retrieving similar relationships. We also found that the proposed method can scale to larger datasets. We used concepts from three biomedical ontologies in our methods - diseases, drugs and genes which show the ability to scale to multiple ontologies.
- Creator
- Date Issued
- 2017
- Academic Affiliation
- Advisor
- Committee Member
- Degree Grantor
- Commencement Year
- Subject
- 最新修改
- 2019-11-18
- Resource Type
- 权利声明
- Language
关联
单件
缩略图 | 标题 | 上传日期 | 公开度 | 行动 |
---|---|---|---|---|
similarityAnalysisOnUnstructuredTextUsingDependencyTreesI.pdf | 2019-11-18 | 公开 | 下载 |