Date of Award

Spring 1-1-2015

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Rob Knight

Second Advisor

Robin Dowell

Third Advisor

Nikolaus Correll

Fourth Advisor

Ken Anderson

Fifth Advisor

Ken Krauter


The research objective of this thesis is to measure the extent of microbial diversity associated with the human large intestine to an accuracy within the limits of the V4 region of the 16S rRNA gene at 97% similarity. This gene has become a powerful tool in assessing microbiome composition, and in recent years, a significant amount of research has shown an intimate relationship between the microbiome and human health. Unlike the human genome, in which the bulk of its content is shared across the human population, there is no common component of the human microbiome. What has been observed is a range of configurations, with factors such as age and BMI being strongly associated with these differences. To date, however, no project has aimed to scope the range of microbiome configurations, and thus our concept of what it means to be healthy (from a microbial perspective) is nonexistent. International efforts such as the American Gut Project will not only help us to understand more about our microbial constituents, but also pave the way toward understanding how these communities can be manipulated for the benefit of human health.

The structure of this thesis is to first provide background about the microbiome through a commentary on the history of 16S, and a review on microbiome research. Following this, the next series of chapters is concerned with building the case for large-scale microbiome studies leading up to the American Gut Project. The second half of the thesis emphasizes the computational difficulties of the research, and specific contributions made to the processing and analysis of sequence data that enable insight into the microbiome. These contributions include a file format that is a recognized standard by the Genomic Standards Consortium, a novel method for transferring taxonomies for benefiting taxonomic curation, and a practical biological example of the use of reproducible and executable IPython Notebooks. Last, the thesis discusses a software tool that has been useful in the analysis of next-generation sequence data, and a few microbiome analyses.