Undergraduate Honors Thesis

 

HAIL HYDRA: Named Entity Resolution, Extraction, and Linking of Lexically Similar Names Public Deposited

Downloadable Content

Download PDF
https://scholar.colorado.edu/concern/undergraduate_honors_theses/x920fx40j
Abstract
  • Words, words, words (Hamlet 2.2 18) Characters and ideas in text are represented by names. A casual reader would have no trouble understanding that a passing reference to Mr. Holmes, Mr. Sherlock Holmes, Sherlock Holmes, and Holmes all trace back to the world’s most famous detective. Names are often shortened or rearranged with common abbreviation or elaborate titles. Each version of a character’s name can be understood as a single head on a multi-headed hydra, all tracing back to the same body. Raw text analysis requires more literary context about how English is structured and how words in a sentence interact to generate the most accurate named entities possible. Many intelligent-dependency parsers and natural language processing systems study text without accounting for how dynamic language can be. This thesis considers the entire body of a piece of literature to identify and relate entities within the same text, regardless of the fluid nature of the exact reference to an entity in literature. Once an entity has been identified, lexically similar names, which refer to the same character, can be linked together to form a global named entity that represents all forms of the named entity referenced in the text. By utilizing raw text as opposed to labeled corpus, this thesis will generate named entities from the text.
Creator
Date Awarded
  • 2018-01-01
Academic Affiliation
Advisor
Granting Institution
Subject
Last Modified
  • 2019-12-02
Resource Type
Rights Statement
Language

Relationships

In Collection:

Items