A cautious approach to generalization in reinforcement learning

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis; Ernst, Damien

Download

Paper published in a book (Scientific congresses and symposiums)

A cautious approach to generalization in reinforcement learning

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis et al.

2010 • In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence

Peer reviewed

Permalink
https://hdl.handle.net/2268/4637

Files (3)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Fonteneau2010ICAART.pdf

Publisher postprint (203.77 kB)

Download

Annexes

slides-22January2010@ICAART.pdf

Publisher postprint (766.09 kB)

Download

BSPACertificateICAART2010.jpg

Publisher postprint (621.05 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement Learning; Prior Knowledge; Cautious Generalization

Abstract :

[en] In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop.

Disciplines :

Computer science

Author, co-author :

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Murphy, Susan

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

A cautious approach to generalization in reinforcement learning

Publication date :

January 2010

Event name :

2nd International Conference on Agents and Artificial Intelligence

Event organizer :

Institute for Systems and Technologies of Information, Control and Communication

Event place :

Valencia, Spain

Event date :

from 22-01-2010 to 24-01-2010

Audience :

International

Main work title :

Proceedings of the 2nd International Conference on Agents and Artificial Intelligence

ISBN/EAN :

092312 1713

Pages :

Peer reviewed :

Peer reviewed

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE]

Commentary :

Best Student Paper Award

Available on ORBi :

since 15 February 2010

Statistics

Number of views

265 (26 by ULiège)

Number of downloads

943 (14 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Bemporad, A. and Morari, M. (1999). Robust model predictive control: A survey. Robustness in Identification and Control, 245:207-226.
Bertsekas, D. and Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific.
Boyan, J. and Moore, A. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, pages 369-376. MIT Press.
Csáji, B. C. and Monostori, L. (2008). Value function based reinforcement learning in changing markovian environments. J. Mach. Learn. Res., 9:1679-1709.
Delage, E. and Mannor, S. (2006). Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research.
Ernst, D. (2005). Selecting concise sets of samples for a reinforcement learning agent. In Proceedings of the Third International Conference on Computational Intelligence, Robotics and Autonomous Systems (CIRAS 2005), page 6.
Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556.
Ernst, D., Glavic, M., Capitanescu, F., and Wehenkel, L. (April 2009). Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 39:517-529.
Fonteneau, R., Murphy, S., Wehenkel, L., and Ernst, D. (2009). Inferring bounds on the performance of a control policy from a sample of trajectories. In Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEEADPRL 09), Nashville, TN, USA.
Gordon, G. (1999). Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University.
Ingersoll, J. (1987). Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc.
Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration. Jounal of Machine Learning Research, 4:1107-1149.
Mannor, S., Simester, D., Sun, P., and Tsitsiklis, J. (2004). Bias and variance in value function estimation. In Proceedings of the 21st International Conference on Machine Learning.
Murphy, S. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2):331-366.
Murphy, S. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24:1455-1481.
Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161-178.
Qian, M. and Murphy, S. (2009). Performance guarantee for individualized treatment rules. Submitted.
Riedmiller, M. (2005). Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005), pages 317-328.
Sutton, R. (1996). Generalization in reinforcement learning: Successful examples using sparse coding. In Advance in Neural Information Processing Systems 8, pages 1038-1044. MIT Press.
Sutton, R. and Barto, A. (1998). Reinforcement Learning. MIT Press.