Dr. Rob Knight
Hundreds of studies have addressed whether the presence or absence of certain bacteria are linked with a particular phenotype. However, it is plausible that the causative agent (or the consequence) of a given phenotype is not a single type of microbe, but groups of them, perhaps in speciﬁc combinations. Rule Induction is a commonly used machine learning method to infer structure within observational data, and build rules to represent these structures. In this thesis I introduce the application of a method, Rule Induction, to infer co-occurrence patterns in microbial data. First, I benchmark the methods within Rule Induction, to assess how rules are generated with regards to several parameters such as table density, support and conﬁdence. I then subsample data over multiple iterations to understand the robustness of the rules being produced to verify due to sampling. Next, I provide insight into diﬀerent biological variables and examine their eﬀect on rules produced. I compare 16S rRNA region, speciﬁcally V1-3 and V3-5 regions. I compare different sequencingtechnology, specifically 454 and Illumina. I finally compare time, specifically looking over a time frame of 400 ays. Within all these comparisons I aim to understand the differentces, but more importantly what is conserved when these samples are stratified by these variables in terms of the generated rules. Finally, I explore Rule Induction using two microbial datasets, and compare the rules to already-known associations. The first dataset I interpret identifies a correlation between HIV and the Gut Microbiome. The second data set distinguishes the Gut Microbiome over varyuing geographical lovations. I link each of these rules produced from each data set with taxonomic information and consolidate those rules to give rise to the underlying structure within the biological data.
Thurimella, Kumar, "Using Rule Induction to Elucidate Co-Occurrence Patterns in Microbial Data" (2013). Undergraduate Honors Theses. 499.