Towards min max generalization in reinforcement learning

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis; Ernst, Damien

Contribution to collective works (Parts of books)

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis et al.

2011 • In Filipe, Joaquim; Fred, Ana; Sharp, Bernadette (Eds.) Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers

Peer reviewed

Permalink
https://hdl.handle.net/2268/99584

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

towards-min-max-generalisation-RL.pdf

Publisher postprint (463.93 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reinforcement learning; generalization

Abstract :

[en] In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the particular case of deterministic Lipschitz continuous environments over continuous state spaces, nite action spaces, and a nite optimization horizon. We discuss the non-triviality of computing an exact solution of the min max problem even after reformulating it so as to avoid search in function spaces. For addressing this problem, we propose to replace, inside this min max problem, the search for the worst environment given a sequence of actions by an expression that lower bounds the worst return that can be obtained for a given sequence of actions. This lower bound has a tightness that depends on the sample sparsity. From there, we propose an algorithm of polynomial complexity that returns a sequence of actions leading to the maximization of this lower bound. We give a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. Our experiments show that this algorithm can lead to more cautious policies than algorithms combining dynamic programming with function approximators.

Disciplines :

Electrical & electronics engineering

Author, co-author :

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Murphy, Susan

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Towards min max generalization in reinforcement learning

Publication date :

2011

Main work title :

Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers

Editor :

Filipe, Joaquim

Fred, Ana

Sharp, Bernadette

Publisher :

Springer

ISBN/EAN :

978-3-642-19889-2

Pages :

61-77

Peer reviewed :

Peer reviewed

Available on ORBi :

since 04 October 2011

Statistics

Number of views

135 (6 by ULiège)

Number of downloads

437 (4 by ULiège)

More statistics