Date of Award

Spring 1-1-2014

Document Type


Degree Name

Doctor of Philosophy (PhD)



First Advisor

Matthew C. Keller

Second Advisor

John Hewitt

Third Advisor

Charles Judd

Fourth Advisor

Matt McQueen

Fifth Advisor

Soo Rhee


The last several years have seen strong evidence that common genetic variants can together explain a substantial proportion of risk to most common diseases that have been investigated. Within the next several years, sequence data will be harnessed to give similar estimates of the proportion of variance in diseases attributable to rare genetic variants. In the mean time, it is difficult to gauge the extent to which molecularly derived heritabilities attributable to the additive effects of genes are currently being underestimated, and to disentangle underestimation due to genetic heterogeneity from underestimation due to non-random associations, known as linkage disequilibrium (LD), between unmeasured causal variants (CVs, especially rare ones) and measured SNPs. Similarly, when estimating genetic correlations between traits in two populations, it is difficult to distinguish when correlations less than one are due to differences in CVs themselves, or to differences in LD between populations. Although we know that common CVs are important predictors of disease, we don't know the age of those variants and, in particular, whether they are shared between distantly diverged populations. This information has implications for understanding evolutionary fitness, generalizing findings across populations, and informing models of functional pathways involved in specific diseases.

In my first study I use SNP data to estimate heritability of schizophrenia in two datasets of European descent (ED) individuals, and consider whether effects are preferentially predicted in specific chromosomes, by particular MAF classes, and according to gene function. In my second study I repeat this for a dataset of African descent (AD) individuals, estimate cross-ethnicity genetic correlations between schizophrenia in ADs and the two previously analyzed datasets of EDs, and to correct for biases between datasets, compare this with a within-ethnicity genetic correlation between the two ED datasets. In order to explore whether lower cross-ethnicity genetic correlations imply differences in CVs or background LD, in my third study I use an extensive simulation design to investigate the likelihood that, given completely overlapping CVs and effect sizes, CV MAF and background LD differences between ADs and EDs will produce lower cross- compared to within-ethnicity genetic correlations. Because of the long divergence time between ADs and EDs, cross-ethnicity genetic correlations between these populations provide a meaningful lower-bound on SNP-derived genetic correlations between traits when all CVs and their effects are exactly shared between ethnicities.

Included in

Genetics Commons