Learning exploration/exploitation strategies for single trajectory reinforcement learning

Castronovo, Michaël; Maes, Francis; Fonteneau, Raphaël; Ernst, Damien

Download

Paper published in a book (Scientific congresses and symposiums)

Learning exploration/exploitation strategies for single trajectory reinforcement learning

Castronovo, Michaël; Maes, Francis; Fonteneau, Raphaël et al.

2012 • In Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012)

Peer reviewed

Permalink
https://hdl.handle.net/2268/127985

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

castronovo12a.pdf

Publisher postprint (270.86 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reinforcement learning; Exploration/Exploitation dilemma; formula discovery

Abstract :

[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising.

Disciplines :

Computer science

Author, co-author :

Castronovo, Michaël ; Université de Liège - ULiège > 2e an. master sc. infor., fin. appr.

Maes, Francis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Learning exploration/exploitation strategies for single trajectory reinforcement learning

Publication date :

2012

Event name :

10th European Workshop on Reinforcement Learning (EWRL 2012)

Event place :

Edinburgh, United Kingdom

Event date :

June 30-July 1, 2012

Audience :

International

Main work title :

Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012)

Collection name :

JMLR Workshop and Conference Proceedings 24

Pages :

1-9

Peer reviewed :

Peer reviewed

Additional URL :

http://jmlr.csail.mit.edu/proceedings/papers/v24/

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]

Available on ORBi :

since 26 July 2012

Statistics

Number of views

344 (36 by ULiège)

Number of downloads

211 (15 by ULiège)

More statistics