Date of Award

Spring 1-1-2012

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Michael C. Mozer

Second Advisor

Rob Knight

Third Advisor

Michael C. Mozer


Human-associated microbial communities have been implicated in a variety of chronic diseases, including inflammatory bowel diseases, obesity, and autoimmune disorders like diabetes. Environmental communities are also important for bioconversion of waste products in biofuel production. However, microbiomes are highly complex systems involving mutualism and competition between many constituent organisms, and a variety of fundamental and interesting computational challenges remain in the modeling of pathogenicity and community-wide response to perturbations [1, 2]. In this thesis we discuss several computational and statistical approaches to predictive modeling of microbiome behavior using high-throughput metagenomic and transcriptomic sequencing data, including models that leverage biological structures such as phylogenies and gene ontologies to help extract features and constrain model complexity. We also demonstrate several applications of these approaches to real biological problems.

We successfully apply predictive modeling to new studies of human-associated and environmental microbial communities in several interdisciplinary collaborations with colleagues at numerous institutions around the world. These include a prominent study of the species and genes present in diverse mammalian gut communities, a study of the effects of yogurt consumption on gut microbial taxa and gene expression (i.e. transcriptomics) in humans and mice, and a large cross-sectional global survey of the human gut microbiota in varied populations. We also develop SourceTracker, a Bayesian approach to predictive modeling of mixtures of microbial communities [3] with important applications in forensics, pollution studies, public health, and detection of sample contamination.

This dissertation introduces predictive modeling of human-associated and environmental microbial communities, increasing our ability to understanding the diversity and distribution of the human microbiota, and especially the systematic changes that occur in different physiological and disease states. We expect this type of predictive modeling to have far-reaching effects on health and disease [4].