Date of Award

Spring 1-1-2011

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Applied Mathematics

First Advisor

Jem N. Corcoran

Second Advisor

Debra S. Goldberg

Third Advisor

Anne Dougherty

Abstract

Bayesian networks are a graphical models that encode conditional probability relationships among multiple random variables. Able to model many variables at once, their applications extend through computational biology, bioinformatics, medicine, image processing, decision support systems, and engineering. Recovery of Bayesian network structure from data is an extremely valuable tool in determining conditional relationships in multivariate data sets, however, existing recovery algorithms require either discrete or Gaussian data. Non-Gaussian continuous data is normally discretized in an ad-hoc and careless manner which is highly likely to destroy the precise conditional dependencies we are out to recover. We explore the effectiveness of a method due to Friedman and Goldszmidt (1996) that is based on a metric from information theory known as "description length". While theoretical interesting, their proposed search strategy for an optimal discretization is infeasible for even moderately sized data sets on the smallest of networks. We introduce a new search strategy based on a "top-down" approach utilizing a local description length metric that can be implemented with significant time savings.

Share

COinS