Date of Award
Doctor of Philosophy (PhD)
Wayne H. Ward
This thesis introduces a word alignment based approach for mapping PropBank predicate- argument structures across languages. We used Chinese and English as the language pair for the study and found our approach was able to reliably predict alignments between predicates and their arguments for the predicates in parallel sentence pairs, even when faced with incorrect word alignment input. Further more, by modeling the predicate-to-predicate and argument-to-argument probabilities over a large unannotated parallel corpora and using an expectation maximization (EM) based approach, we were able to further improve the alignment performance of the system.
As part of the semantic mapping system, we also developed both a Chinese and an English constituent-based semantic role labeler (SRL) for arguments of verbal and non-verbal predicates. By building topic model based selectional preferences for each language, as well as using techniques such as support predicate identification and multi-stage classification, we were able to achieve what we believe is the state-of-the-art Chinese SRL performance and one of the best English SRL performances using a single constituent tree input. More over, by improving the performance of Chinese nominal SRL by over 2 F points, we demonstrated that selectional preferences can significantly improve SRL when the argument candidates are not well constrained by syntax.
As a case study, we successfully applied our semantic mapping system to aligning Chinese dropped pronouns to English text as a way of understanding when replacement words would be required in English in place of the dropped pronouns during Chinese-English machine translation. In the process, we also demonstrated that semantic knowledge is essential for recovering Chinese dropped pronouns by producing a state-of-the-art SRL-enhanced Chinese empty category recovery system.
Wu, Shumin, "Leveraging Semantic Similarity in Parallel Corpora for Natural Language Processing" (2015). Computer Science Graduate Theses & Dissertations. 97.