Date of Award
Master of Science (MS)
A Bayesian network is a graphical model that can be used to represent potentially complex relationships between a large number of random variables. Given data consisting of realizations of the nodes (variables) of a network with unknown structure, much work has been done in recent years to recover the directed edges that describe the joint behavior of the nodes. The underlying assumption for such recovery algorithms is that the data are either from discrete or Gaussian distributions. In the event that neither hold, structure recovery algorithms are usually run after a discretization of the data. Unfortunately, if the discretization is not performed in a thoughtful way, the very structure that one is trying to recover may be obscured.
In this thesis, we extend the work of Friedman and Goldszmidt [Proceedings of the Thirteenth International Conference on Machine Learning, pp. 157–165, (1996)] where a principled approach was developed based on the idea of minimum description length. Although their approach gives a scoring mechanism that appears to be able to recover a good discretization, actually finding high-scoring discretizations is difficult due to the large number of possibilities that need to be checked. We propose and implement a Monte Carlo search procedure based on a birth-and-death process to maximize discretization score. This allows for different discretization thresholds to be inserted into and removed from the data, creating a random walk around the ``discretization space" in a way that ensures visitation to high-scoring discretizations.
Woytarowicz, Nicole, "Discretizing Continuous Attributes of a Bayesian Network With a Birth and Death Process Based on Minimum Description Length" (2018). Applied Mathematics Graduate Theses & Dissertations. 98.