Type of Thesis
Dr. Chenhao Tan
Words, words, words (Hamlet 2.2 18)
Characters and ideas in text are represented by names. A casual reader would have no trouble understanding that a passing reference to Mr. Holmes, Mr. Sherlock Holmes, Sherlock Holmes, and Holmes all trace back to the world’s most famous detective. Names are often shortened or rearranged with common abbreviation or elaborate titles. Each version of a character’s name can be understood as a single head on a multi-headed hydra, all tracing back to the same body. Raw text analysis requires more literary context about how English is structured and how words in a sentence interact to generate the most accurate named entities possible. Many intelligent-dependency parsers and natural language processing systems study text without accounting for how dynamic language can be. This thesis considers the entire body of a piece of literature to identify and relate entities within the same text, regardless of the ﬂuid nature of the exact reference to an entity in literature. Once an entity has been identiﬁed, lexically similar names, which refer to the same character, can be linked together to form a global named entity that represents all forms of the named entity referenced in the text. By utilizing raw text as opposed to labeled corpus, this thesis will generate named entities from the text.
Schneck, Cora, "HAIL HYDRA: Named Entity Resolution, Extraction, and Linking of Lexically Similar Names" (2018). Undergraduate Honors Theses. 1566.
American Literature Commons, Children's and Young Adult Literature Commons, Databases and Information Systems Commons, Digital Humanities Commons, Literature in English, British Isles Commons, Literature in English, North America Commons, Other Computer Sciences Commons