Date of Award
Doctor of Philosophy (PhD)
James H. Martin
Previous research in natural language processing in support of information extraction for Crisis Informatics has exploited a variety of linguistic features for the semantic characterization of Twitter communications produced during hazard situations. Project EPIC (Empowering the Public with Information in Crisis; an interdisciplinary research effort funded by the National Science Foundation and housed in the Department of Computer Science, University of Colorado at Boulder) studies have pursued the annotation and extraction of named entities (Corvey et. al. 2012; Verma et. al. 2011), semantic roles (Corvey et. al. 2012), and the tweet-level attributes of linguistic register, subjectivity, and personal or impersonal style (Verma et. al. 2011; Corvey et. al. 2012). The latter, high-level linguistic features have been applied in the classification of a key behavioral attribute, Situational Awareness (Verma et. al. 2011; Corvey et al. 2012). However, pragmatic features pertaining to a user's perceived confidence in and ownership of the hazard information presented on Twitter have yet to be explored. I propose an information extraction system targeting key pragmatic features, centered around the concepts of linguistic Evidentiality (Aikhenvald 2004; Fox 2001; Chafe 1986), Territory of Information (Kamio 1997), and Speech Act Theory (Austin 1962; Searle 1969, 1976, 1979). The system aims to improve information retrieval through refining the characterization of a tweet's relevance to Situational Awareness. This thesis discusses theoretical motivations and background; presents the results of a series of experiments testing the utility of the pragmatic annotations proposed; and engages key theoretical questions motivated by these experimental results.
Corvey, William, "Leveraging Pragmatic Features for Microblogged Information Extraction During Crises" (2013). Linguistics Graduate Theses & Dissertations. 26.