Date of Award

Spring 1-1-2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Kenneth M. Anderson

Second Advisor

Richard Han

Third Advisor

Tom Yeh

Fourth Advisor

Qin Lv

Fifth Advisor

Kai Larsen

Abstract

There is high demand for techniques and tools to process and analyze large sets of streaming data in both industrial and academic settings. While existing work in this area has focused on a wide range of issues including persistence technologies, advanced analysis tools, functional web interfaces, and the like, I focus on query support. In particular, I focus on providing analysts flexibility with respect to the types of queries they can make on large data sets, in real time as well as over historical data. I am building a lightweight service-based framework—EPIC Real-Time—that manages a set of queries that can be applied to user-initiated data analysis events (such as studying tweets generated during a disaster). My prototype combines stream processing and batch processing techniques inspired by the approach embodied in the Lambda Architecture. I investigate a core set of query types that can answer the wide range of queries asked by analysts who study crisis events. For this research, I design and develop a flexible set of real-time analytical tools that will allow analysts to ask new types of questions as they move their research activity from after a crisis to analysis during an event. This will enable them to monitor online social behaviors and capture interesting interactions in real-time across the various phases of a disaster. In this dissertation, I present a prototype implementation of EPIC Real-Time which makes use of message-driven and reactive programming techniques. I also present a performance evaluation on how efficiently the real-time and batch-oriented queries perform, how well these queries meet the needs of Project EPIC analysts, and provide insight into how EPIC Real-Time performs along a number of non-functional requirements important for big data, such as performance, usability, scalability, and reliability.

Figures.zip (4015 kB)
Figures

Share

COinS