Learning for exploration/exploitation in reinforcement learning

Castronovo, Michaël

Download

Master’s dissertation (Dissertations and theses)

Learning for exploration/exploitation in reinforcement learning

Castronovo, Michaël

2012

Permalink
https://hdl.handle.net/2268/131885

Files (2)Send to Details Statistics Bibliography Similar publications

Files

Full Text

main.pdf

Author postprint (614.37 kB)

Download

Annexes

main.pdf

Publisher postprint (188.19 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement Learning; Exploration/Exploitation dilemma,; Formula discovery

Abstract :

[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM(·). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an infinite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM(·). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions ˆ V and ˆQ of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with -Greedy strategies and the results are promising.

Disciplines :

Computer science

Author, co-author :

Castronovo, Michaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Learning for exploration/exploitation in reinforcement learning

Defense date :

June 2012

Number of pages :

Institution :

ULiège - Université de Liège

Degree :

Master en sciences informatiques, à finalité approfondie

Promotor :

Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Louveaux, Quentin ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Available on ORBi :

since 09 October 2012

Statistics

Number of views

144 (18 by ULiège)

Number of downloads

213 (15 by ULiège)

More statistics