Graduate Thesis Or Dissertation

 

Streaming Temporal Graphs Public Deposited

Downloadable Content

Download PDF
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/3r074v17m
Abstract
  • We provide a domain specific language called the Streaming Analytics Language (SAL) to write concise but expressive analyses of streaming temporal graphs. We target problems where the data comes as an infinite stream and where the volume is prohibitive, requiring a single pass over the data and tight spatial and temporal complexity constraints. Also, each item in the stream can be thought of as an edge in a graph, and each edge has an associated timestamp and duration.A real-world problem that is a streaming temporal graph is cyber security data. Machines communicate with each other within a network, forming a streaming sequence of edges with temporal information. As such, we elucidate the value of SAL by applying it to a large range of cyber-related problems. With a combination of vertex-centric computations that create features per vertex, and subgraph matching to find communication patterns of interest, we cover a wide spectrum of important cyber use cases. As an example, we discuss Verizon’s Data Breach Investigations Report, and show how SAL can be used to capture most of the nine different categories of cyber breaches. Also, we apply SAL to discovering botnet activity within network traffic in 13 different scenarios, with an average area under the curve (AUC) of the receiver operating characteristic (ROC) of 0.87.Besides SAL as a language, as another contribution we present an implementation we call the Streaming Analytics Machine (SAM). With SAM, we can run SAL programs in parallel on a cluster, achieving rates of a million netflows per second, and scaling to 128 nodes or 2560 cores. We compare SAM to another streaming framework, Apache Flink, and find that Flink cannot scale past 32 nodes for the problem of finding triangles (a subgraph of three interconnected nodes) within the streaming graph. Also, SAM excels when the subgraphs are frequent, continuing to find the expected number of subgraphs, while Flink performance degrades and under-reports. Together, SAL and SAM provide an expressive and scalable infrastructure for performing analyses on streaming temporal graphs.
Creator
Date Issued
  • 2019
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Last Modified
  • 2019-11-14
Resource Type
Rights Statement
Language

Relationships

Items