Date of Award

Spring 1-1-2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

First Advisor

Dirk Grunwald

Second Advisor

Eric Keller

Third Advisor

Qin Lv

Fourth Advisor

Shivakant Mishra

Fifth Advisor

Sriram Sankaranarayanan

Abstract

With large volumes of geo-tagged data collected in various applications, spatial query pro- cessing becomes essential. Query engines depend on efficient indexes to expedite processing. There are three main challenges: scaling out to accommodate large volumes of spatial data, support- ing transactional primitives for strong consistency guarantees, and adapting to highly dynamic workloads. This thesis proposes a paradigm for scalable, transactional, and efficient spatial indexes to significantly reduce development efforts in designing and comparing multiple spatial indexes.

This thesis first introduces a distributed and transactional key value store called DTranx to persist the spatial indexes. DTranx follows the SEDA architecture to exploit high concurrency in multi-core environments and it adopts a hybrid of optimistic concurrency control and two-phase commit protocols to narrow down the critical sections of distributed locking during transaction com- mits. Moreover, DTranx integrates a persistent memory based write-ahead log to reduce durability overhead and combines a garbage collection mechanism without affecting normal transactions. To maintain high throughput for search workloads when databases are constantly updated, snapshot transactions are introduced.

Then, a paradigm is presented with a set of intuitive APIs and a Mempool runtime to re- duce development efforts. Mempool transparently synchronizes local states of data structures with DTranx and it handles two critical tasks: address translation and transparent server synchroniza- tion, of which the latter includes transaction construction and data synchronization. Furthermore, a dynamic partitioning strategy is integrated into DTranx to generate partitioning and replication plans that reduce inter-server communications and balance resource usage.

Lastly, single-threaded data structures BTree and RTree are converted into distributed versions within two weeks. The BTree and RTree achieve 253.07 kops/sec and 77.83 kops/sec through- put respectively for pure search operations in a 25-server cluster.

Share

COinS