Graduate Thesis Or Dissertation

 

Optimizing Data Pre-Processing Transformations with Reinforcement Learning Public Deposited

https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/cf95jd01d
Abstract
  • In this work, we use Reinforcement Learning (RL) to optimize the data pre-processing transformations in a machine learning pipeline given a dataset X, child algorithm f(·), and action space A. Inspired by Effective data pre-processing for AutoML [4] and Learn2Clean [1], we construct a model that 1) does not specify a data pre-processing pipeline structure in advance and 2) does not depend on transformation specific rules or empirical calculations across multiple datasets. Using a simple policy optimization scheme, we produce comparable results to [4] across multiple datasets from the OpenML-CC18 benchmark suite and with Naive Bayes (NB) as our child algorithm. This was accomplished by only finding an optimal order of actions sampled from the action space A, where the parameters of each action were kept to their default values. We hope that this model can serve as a basis for future projects, including the study of how such data transformations affect the manifold structure as well as implementing a conditional aspect to our model to make it more efficient.
Creator
Date Issued
  • 2022-08-02
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Publisher
Last Modified
  • 2022-09-13
Resource Type
Rights Statement
Language

Relationships

Items