Graduate Thesis Or Dissertation

 

Dissimilarity and Optimal Sampling in Urn Ensembles Public Deposited

https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/c247ds08x
Abstract
  • We study an ensemble of urns with unknown compositions inferred from initial samples with replacement from each urn. This model fits diverse situations. For instance, in microbial ecology studies each urn represents an environment, each ball within an urn corresponds to an individual bacterium, and a ball's color represents its taxonomic label. In a different context, each urn could represent a random RNA pool and each colored ball a possible solution to a particular binding site problem over that pool. The main parameter of this study is dissimilarity, which we define as the probability that a draw from one urn is not seen in a sample of size k from a possibly different urn. We estimate this parameter with a U-statistic, shown to be the uniformly minimum variance unbiased estimator (UMVUE) of dissimilarity over a range for k determined by initial sample sizes. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over k, we provide conditions that guarantee uniformly consistent estimates of variances via a jackknife method, and show uniform convergence in probability as well as approximately normal marginal distributions. We apply our U-statistics and a restricted exponential regression to extrapolate dissimilarity over a range beyond that determined by initial sample sizes, which we use to identify an allocation of draws for subsequent sampling that minimizes a measure of pair-wise dissimilarities over the whole ensemble. This is motivated by the challenge faced by microbiome projects worldwide to effectively allocate additional samples for a more robust and reliable estimation of UniFrac distances between pairs of environments. Similar methods are applied to measures of sample quality of the ensemble derived from alpha-diversity and coverage. We test our methods against simulated data, where we compare optimal and inferred draw allocations when considering these three measures, and analyze 16S ribosomal RNA data from the Human Microbiome Project.
Creator
Date Issued
  • 2012
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Last Modified
  • 2019-11-15
Resource Type
Rights Statement
Language

Relationships

Items