Date of Award

Spring 4-24-2018

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Applied Mathematics

First Advisor

Jem Corcoran

Abstract

A Bayesian network is a graphical model that can be used to represent potentially complex relationships between a large number of random variables. Given data consisting of realizations of the nodes (variables) of a network with unknown structure, much work has been done in recent years to recover the directed edges that describe the joint behavior of the nodes. The underlying assumption for such recovery algorithms is that the data are either from discrete or Gaussian distributions. In the event that neither hold, structure recovery algorithms are usually run after a discretization of the data. Unfortunately, if the discretization is not performed in a thoughtful way, the very structure that one is trying to recover may be obscured.

In this thesis, we extend the work of Friedman and Goldszmidt [Proceedings of the Thirteenth International Conference on Machine Learning, pp. 157–165, (1996)] where a principled approach was developed based on the idea of minimum description length. Although their approach gives a scoring mechanism that appears to be able to recover a good discretization, actually finding high-scoring discretizations is difficult due to the large number of possibilities that need to be checked. We propose and implement a Monte Carlo search procedure based on a birth-and-death process to maximize discretization score. This allows for different discretization thresholds to be inserted into and removed from the data, creating a random walk around the ``discretization space" in a way that ensures visitation to high-scoring discretizations.

Share

COinS