Date of Award

Spring 1-1-2014

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Linguistics

First Advisor

Martha Palmer

Second Advisor

Annie Zaenen

Third Advisor

Laura A. Michaelis

Fourth Advisor

James H. Martin

Fifth Advisor

Bhuvana Narasimhan

Abstract

Advances in syntactic parsing and semantic role labeling have been a boon to Natural Language Processing. However, they perform poorly with sentences that do not conform to expected syntax-semantic patterning behavior. For example, in the sentence "The crowd laughed the clown off the stage", a verb of non-verbal communication laugh is coerced into the semantics of a caused motion construction (CMC) and gains a motion entailment that is atypical given its inherent lexical semantics. Accurate semantic role labeling for such sentences requires that NLP classifiers accurately identify these coerced usages in data. Given accurate semantic role labels, the sentence would also require a semantic interpretation with appropriate representations that include the semantics of the CMCs.

This thesis focuses on the definition, identification, and representation of the CMCs. We expand on the work from Construction Grammar to develop the semantic types and varieties of CMCs for corpus annotation. Utilizing the annotation as the training and test data, we train automatic CMC classifiers and demonstrate that CMCs can be reliably identified in the corpus data. Furthermore, we develop a new set of semantic predicates in VerbNet for the semantic representation of CMCs. These predicates will provide for the representation of CMC sentences, but also give VerbNet a more consistent explicit representation for paths of motion. Finally, we demonstrate that CMC representation can help give the proper semantic representation to sentences even when the verb in the sentence does not include the semantics of CMC.

The overall contribution of this work is the establishment of the processes involved in identifying and representing constructions in an empirical setting. This work assesses the necessary steps to define and annotate constructions in a corpus setting, train classifiers for constructions, and represent the semantics of constructions through VerbNet predicates. While we have focused on the identification and representation of caused motion constructions, a similar corpus-driven study can be conducted for other constructions whose sentence representations would not be possible with the semantics of the verb alone.

Share

COinS