Date of Award
Doctor of Philosophy (PhD)
Improvements in sequencing technologies have shifted the foundations in biology, ecology and health. Traditionally, these sciences have dealt with small amounts of data that could be analyzed using simple methods and computational tools. Today, they are confronted with massive numbers of sequences within thousands of samples. These sequences represent the DNA from microorganisms that inhabit diverse environments, from soils, oceans to the human body. Additionally, the recent studies are now moving from simple snapshots to spatial and temporal datasets to studying the distribution of these microbial guests. These larger studies reveal the lack of computational methods and resources researchers have to circumvent to understand the intrinsic patterns of their new sequence based studies. In this dissertation, I present new computational tools, methods, and visualizations that allow microbiologists to make sense of these massive studies, and the interesting results concerning human health that can be obtained from microbial ecology studies. Also, I present a cloud computing method for combining these larger studies, which has already produced potentially important health insights into the temporal development of infants. Finally, I describe a new software tool, which allows microbial ecology researchers to design and statistically power future studies based on previously published studies. These novel components not only demonstrate the future of microbial computational biology, but also show the kind of medical and ecological advances we can observe by combining computational tools with new sequencing technologies.
Gonzalez Pena, Antonio, "Tools, Methods and Visualizations to Elucidate Spatial and Temporal Patterns in Microbial Community Studies with Billions of Samples" (2012). Computer Science Graduate Theses & Dissertations. 50.