| Reference : Learning for exploration/exploitation in reinforcement learning |
| Dissertations and theses : Master's dissertation | |||
| Engineering, computing & technology : Computer science | |||
| http://hdl.handle.net/2268/131885 | |||
| Learning for exploration/exploitation in reinforcement learning | |
| English | |
Castronovo, Michaël [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >] | |
| Jun-2012 | |
| Université de Liège, Liège, Belgique | |
| Master en sciences informatiques, à finalité approfondie | |
| 51 | |
Ernst, Damien ![]() | |
Louveaux, Quentin ![]() | |
Wehenkel, Louis ![]() | |
Geurts, Pierre ![]() | |
| [en] Reinforcement Learning ; Exploration/Exploitation dilemma, ; Formula discovery | |
| [en] We consider the problem of learning high-performance Exploration/Exploitation (E/E)
strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM(·). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an infinite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM(·). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions ˆ V and ˆQ of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with -Greedy strategies and the results are promising. | |
| Researchers ; Students | |
| http://hdl.handle.net/2268/131885 |
| File(s) associated to this reference | ||||||||||||||||||||||||||
|
Fulltext file(s):
Additional material(s):
| ||||||||||||||||||||||||||
All documents in ORBi are protected by a user license.