Graduate Thesis Or Dissertation

 

A Temporal Difference Learning Approach to Network Revenue Management Öffentlichkeit Deposited

Herunterladbarer Inhalt

PDF Herunterladen
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/xp68kh54c
Abstract
  • The network revenue management (NRM) problem has been well-studied in the literature, with the largemajority of research focusing on the approximate linear programming (ALP) approach. The ALP approach has been shown to work quite well for many problem instances. The performance of the ALP approach depends critically on the approximation architecture. However, even separable piecewise linear (SPL) approx- imation, widely considered the strongest approximation, cannot fully account for the strong network effects that can exist in the NRM problem. This limitation is mainly due to the nature of the ALP approach. In this research, we explore simulation-based reinforcement learning methods that are more flexible and can utilize a broader class of approximation architectures beyond those viable for the ALP approach. Our primary focus is the widely used temporal difference (TD) learning algorithm. We develop two distinct adaptations of the TD learning algorithm that are tailored to fit the structure of the NRM problem and test the algorithms using various approximation architectures and initial policies. We introduce a novel eligibility trace which we call the salience trace that significantly increases TD learning performance for approximations that use binary features. We demonstrate via experiments that the TD learning algorithms do not rely on commercial mathematical programming solvers and can lift the policy performance to near optimal even when the initial policy is a naive one, which accepts a request whenever feasible. Furthermore, we show that the TD learning algorithms can adopt a recently proposed novel approximation architecture (Zhang et al. 2021b), which can better capture the network effects and provide smaller approximation errors than the SPL approximation, but is intractable for the ALP approach. Over a set of 40 problem instances, heuristic policies based on the learned value function approximations generate expected revenues that are competitive with the bid-price policy derived from the ALP approach in 60% of instances.

Creator
Date Issued
  • 2022-05-18
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Publisher
Zuletzt geändert
  • 2022-12-13
Resource Type
Urheberrechts-Erklärung
Language

Beziehungen

Artikel