Date of Award

Spring 5-2017

Document Type


Degree Name

Doctor of Philosophy (PhD)

First Advisor

Robin Dowell

Second Advisor

Aaron Clauset

Third Advisor

Michael Mozer

Fourth Advisor

Elizabeth Bradley

Fifth Advisor

Katerina Kechris


Seventy-six percent of disease associated variants occur in non-genic sites of open chromatin suggesting that the regulation of gene expression plays a crucial role in human health. Nucleosome-free with flanking chromatin modifications, these regulatory loci are optimal platforms for transcription binding and, in fact, recruit RNA Polymerase. The subsequent transcription of these sites is an unintuitive discovery as these regulatory loci do not harbor an open reading frame.

The role these enhancer RNAs (eRNA) play in downstream gene regulation remains an open and exciting question. However, fast RNA degradation rates challenge eRNA identification, requiring non-traditional sequencing technologies. Global Run-on followed by sequencing (GRO- seq) detects non-genic transcription and thus, in theory, eRNA presence. Yet GRO-seq is not without noise and bias, predictive modeling of both the sequencing error and the stochastic nature of RNA polymerase itself is required to discover enhancer RNA transcripts.

In short, this thesis asks: what regulates eRNA transcription? To answer this question, I first develop two novel probabilistic models to unbiasedly determine eRNA location. A regression method was constructed to quickly identify all transcribed regions in GRO-seq. Based on the known enzymatic stages of RNA polymerase, a subsequent latent variable model was built to infer the precise location of eRNA initiation. With the relevant technology developed, I undertake a massive data integration project and show strong contextual relationships between TF-binding events, epigenetics and eRNA transcription. I conclude by showing that enhancer RNAs can unbiasedly quantify transcription factor activity and predict cell type.