Graduate Thesis Or Dissertation
Optimizing Data Pre-Processing Transformations with Reinforcement Learning Public Deposited
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/cf95jd01d
- Abstract
- In this work, we use Reinforcement Learning (RL) to optimize the data pre-processing transformations in a machine learning pipeline given a dataset X, child algorithm f(·), and action space A. Inspired by Effective data pre-processing for AutoML [4] and Learn2Clean [1], we construct a model that 1) does not specify a data pre-processing pipeline structure in advance and 2) does not depend on transformation specific rules or empirical calculations across multiple datasets. Using a simple policy optimization scheme, we produce comparable results to [4] across multiple datasets from the OpenML-CC18 benchmark suite and with Naive Bayes (NB) as our child algorithm. This was accomplished by only finding an optimal order of actions sampled from the action space A, where the parameters of each action were kept to their default values. We hope that this model can serve as a basis for future projects, including the study of how such data transformations affect the manifold structure as well as implementing a conditional aspect to our model to make it more efficient.
- Creator
- Date Issued
- 2022-08-02
- Academic Affiliation
- Advisor
- Committee Member
- Degree Grantor
- Commencement Year
- Subject
- Publisher
- Last Modified
- 2022-09-13
- Resource Type
- Rights Statement
- Language
Relationships
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
Finley_colorado_0051N_17967.pdf | 2022-09-13 | Public | Download | |
Thesis_Approval_Form.pdf | 2022-09-13 | Public | Download |