Graduate Thesis Or Dissertation

 

Using Minimum Description Length for Discretization Classification of Data Modeled by Bayesian Networks Público Deposited

https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/df65v821b
Abstract
  • Bayesian networks are a graphical models that encode conditional probability relationships among multiple random variables. Able to model many variables at once, their applications extend through computational biology, bioinformatics, medicine, image processing, decision support systems, and engineering. Recovery of Bayesian network structure from data is an extremely valuable tool in determining conditional relationships in multivariate data sets, however, existing recovery algorithms require either discrete or Gaussian data. Non-Gaussian continuous data is normally discretized in an ad-hoc and careless manner which is highly likely to destroy the precise conditional dependencies we are out to recover. We explore the effectiveness of a method due to Friedman and Goldszmidt (1996) that is based on a metric from information theory known as "description length". While theoretical interesting, their proposed search strategy for an optimal discretization is infeasible for even moderately sized data sets on the smallest of networks. We introduce a new search strategy based on a "top-down" approach utilizing a local description length metric that can be implemented with significant time savings.
Creator
Date Issued
  • 2011
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Última modificación
  • 2019-11-17
Resource Type
Declaración de derechos
Language

Relaciones

Elementos