Graduate Thesis Or Dissertation

 

Leveraging Semantic Similarity in Parallel Corpora for Natural Language Processing Public Deposited

Downloadable Content

Download PDF
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/kk91fk93k
Abstract
  • This thesis introduces a word alignment based approach for mapping PropBank predicate- argument structures across languages. We used Chinese and English as the language pair for the study and found our approach was able to reliably predict alignments between predicates and their arguments for the predicates in parallel sentence pairs, even when faced with incorrect word alignment input. Further more, by modeling the predicate-to-predicate and argument-to-argument probabilities over a large unannotated parallel corpora and using an expectation maximization (EM) based approach, we were able to further improve the alignment performance of the system. As part of the semantic mapping system, we also developed both a Chinese and an English constituent-based semantic role labeler (SRL) for arguments of verbal and non-verbal predicates. By building topic model based selectional preferences for each language, as well as using techniques such as support predicate identification and multi-stage classification, we were able to achieve what we believe is the state-of-the-art Chinese SRL performance and one of the best English SRL performances using a single constituent tree input. More over, by improving the performance of Chinese nominal SRL by over 2 F points, we demonstrated that selectional preferences can significantly improve SRL when the argument candidates are not well constrained by syntax. As a case study, we successfully applied our semantic mapping system to aligning Chinese dropped pronouns to English text as a way of understanding when replacement words would be required in English in place of the dropped pronouns during Chinese-English machine translation. In the process, we also demonstrated that semantic knowledge is essential for recovering Chinese dropped pronouns by producing a state-of-the-art SRL-enhanced Chinese empty category recovery system.
Creator
Date Issued
  • 2015
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Last Modified
  • 2019-11-14
Resource Type
Rights Statement
Language

Relationships

Items