Date of Award
Master of Science (MS)
Jem N. Corcoran
Debra S. Goldberg
Bayesian networks are a graphical models that encode conditional probability relationships among multiple random variables. Able to model many variables at once, their applications extend through computational biology, bioinformatics, medicine, image processing, decision support systems, and engineering. Recovery of Bayesian network structure from data is an extremely valuable tool in determining conditional relationships in multivariate data sets, however, existing recovery algorithms require either discrete or Gaussian data. Non-Gaussian continuous data is normally discretized in an ad-hoc and careless manner which is highly likely to destroy the precise conditional dependencies we are out to recover. We explore the effectiveness of a method due to Friedman and Goldszmidt (1996) that is based on a metric from information theory known as "description length". While theoretical interesting, their proposed search strategy for an optimal discretization is infeasible for even moderately sized data sets on the smallest of networks. We introduce a new search strategy based on a "top-down" approach utilizing a local description length metric that can be implemented with significant time savings.
Levine, Nicholas D., "Using Minimum Description Length for Discretization Classification of Data Modeled by Bayesian Networks" (2011). Applied Mathematics Graduate Theses & Dissertations. 21.