Date of Award

Spring 1-1-2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Kenneth M.. Anderson

Second Advisor

Judith Stafford

Third Advisor

Richard Han

Fourth Advisor

Qin Lv

Fifth Advisor

Gita Alaghband

Abstract

Everyday, enormous amounts of data are generated by a wide variety of computational systems. This data needs to be collected, stored, and analyzed to generate insights and information useful to the organizations performing this work. Typical workflows include consumer behavior interpretation, product recommendations, predicting future trends, and even support for emergency management before, during, and after mass emergency events. In the emergency management space, a new area of study—crisis informatics—examines how members of the public make use of social media during times of disaster. Crisis informatics software aims to collect and analyze the large amount of information generated on social media during times of mass emergency. In general, current crisis informatics software is focused on the batch processing of crisis data after an event has transitioned out of the immediate response and recovery phases. Now, there is a need to collect and analyze crisis data in real-time as it is streaming in during the crisis event itself.

This thesis offers an examination of the software architectures, techniques, frameworks, and middleware that are needed to augment crisis informatics software that make use of batch processing techniques to perform data analysis with those that incrementally process, store, and analyze data as it arrives. This thesis work responds to the desires of analysts who need access to real-time data analytics and efficient batch data processing techniques to comprehensively analyze a mass emergency event. The techniques developed to achieve these goals have been implemented in a system called the Incremental Data Collection and Analytics Platform (IDCAP). This platform enables a comprehensive evaluation of the utility of these techniques. The system provides the following features: incremental data collection and indexing in real-time of social media data; support for real-time analytics at interactive speeds; highly concurrent batch data processing supported by a novel data model; and a front-end web client, known as the IDCA App, that allows an analyst to manage IDCAP resources, to monitor incoming data in real-time, and to provide an interface that allows incremental queries to be performed on top of large datasets.

Share

COinS