An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length

Tran, Dai Daniel

Graduate Thesis Or Dissertation

An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length Public Deposited

Analytics

Citations

Citeable URL: https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/5h73pw37t

Abstract

Bayesian networks are widely considered as powerful tools for modeling risk assessment, uncertainty, and decision making. They have been extensively employed to develop decision support systems in variety of domains including medical diagnosis, risk assessment and management, human cognition, industrial process and procurement, pavement and bridge management, and system reliability. Bayesian networks are convenient graphical expressions for high dimensional probability distributions which are used to represent complex relationships between a large number of random variables. A Bayesian network is a directed acyclic graph consisting of nodes which represent random variables and arrows which correspond to probabilistic dependencies between them. The ability to recover Bayesian network structures from data is critical to enhance their application in modeling real-world phenomena. Many research efforts have been done on this topic to identify the specific network structure. However, most Bayesian network learning procedures are based on the following two assumptions: (1) that the data are discrete or (2) that the data are continuous and either follow a Gaussian distribution or are otherwise discretized before recovery. Discretization of data in the continuous non-Gaussian case is often done in an ad hoc manner which destroys the conditional relationships among variables– subsequent network recovery algorithms are then unable to retrieve the correct network. Friedman and Goldszmidt [11] suggest an approach based on the minimum description length principle that chooses a discretization which preserves the information in the original data set, however it is one which is difficult, if not impossible, to implement for even moderately sized networks. This thesis explores a structure of the minimum description length developed and then provides an alternative efficient search strategy which allows one to use the Friedman and Goldszmidt in practice.

Creator