Reference : A cautious approach to generalization in reinforcement learning
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/4637
A cautious approach to generalization in reinforcement learning
English
Fonteneau, Raphaël mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Murphy, Susan [ > > ]
Wehenkel, Louis mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Jan-2010
Proceedings of the 2nd International Conference on Agents and Artificial Intelligence
10
Yes
No
International
092312 1713
2nd International Conference on Agents and Artificial Intelligence
from 22-01-2010 to 24-01-2010
Institute for Systems and Technologies of Information, Control and Communication
Valencia
Spain
[en] Reinforcement Learning ; Prior Knowledge ; Cautious Generalization
[en] In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop.
Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS ; Fonds pour la formation à la Recherche dans l'Industrie et dans l'Agriculture (Communauté française de Belgique) - FRIA
Researchers ; Professionals ; Students
http://hdl.handle.net/2268/4637
Best Student Paper Award

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
Fonteneau2010ICAART.pdfPublisher postprint199 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
slides-22January2010@ICAART.pdf748.14 kBView/Open
Open access
BSPACertificateICAART2010.jpg606.5 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.