Active exploration by searching for experiments that falsify the computed control policy

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis; Ernst, Damien

doi:10.1109/ADPRL.2011.5967364

Download

Paper published in a book (Scientific congresses and symposiums)

Active exploration by searching for experiments that falsify the computed control policy

Fonteneau, Raphaël; Murphy, Susan; Wehenkel, Louis et al.

2011 • In Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11)

Peer reviewed

Permalink
https://hdl.handle.net/2268/88934

DOI
10.1109/ADPRL.2011.5967364

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

adprl11_Fonteneau_et_al.pdf

Publisher postprint (665.68 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reinforcement learning; active learning; sequential decision making

Abstract :

[en] We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identiﬁcation method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising.

Disciplines :

Computer science

Author, co-author :

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Murphy, Susan

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Active exploration by searching for experiments that falsify the computed control policy

Publication date :

April 2011

Event name :

2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11)

Event place :

Paris, France

Event date :

April 11-15, 2011

Audience :

International

Main work title :

Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11)

Peer reviewed :

Peer reviewed

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]

Available on ORBi :

since 14 April 2011

Statistics

Number of views

63 (8 by ULiège)

Number of downloads

227 (6 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Reserch, 3:397-422, 2003.
F. Aurenhammer. Voronoi diagrams - a survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), 23(3):345-405, 1991.
S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári. Online optimization in X-armed bandits. In Advances in Neural Information Processing Systems 21, pages 201-208. MIT Press, 2009.
L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst. Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis CRC Press, 2010.
J.D. Cohen, S.M. McClure, and A.J. Yu. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B 29, 362(1481):933-942, 2007. (Pubitemid 47056820)
A. Ephsteyn, A. Vogel, and G. DeJong. Active reinforcement learning. In Proceedings of the 25th international conference on Machine learning (ICML 2008), volume 307, 2008.
D. Ernst. Selecting concise sets of samples for a reinforcement learning agent. In Proceedings of the Third International Conference on Computational Intelligence, Robotics and Autonomous Systems (CIRAS 2005), Singapore, 2005.
D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.
R. Fonteneau and D. Ernst. Voronoi model learning for batch mode reinforcement learning. Technical report, University of Liège, 2010.
R. Fonteneau, S. A. Murphy, L. Wehenkel, and D. Ernst. Towards min max generalization in reinforcement learning. To be published as book chapter in the series Communications in Computer and Information Science (CCIS) by Springer-Verlag, 2010.
R. Fonteneau, S.A. Murphy, L. Wehenkel, and D. Ernst. Generating informative trajectories by using bounds on the return of control policies. In Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010), 2010.
J.E. Ingersoll. Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc., 1987.
L.P. Kaelbling. Learning in Embedded Systems. MIT Press, 1993.
R. Munos and A. Moore. Variable resolution discretization in optimal control. Machine Learning, 49:291-323, 2002. (Pubitemid 34325691)
S.A. Murphy. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2):331-366, 2003. (Pubitemid 36607793)
S.A. Murphy. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24:1455-1481, 2005. (Pubitemid 40716347)
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3):161-178, 2002. (Pubitemid 34325684)
E. Rachelson, F. Schnitzler, L. Wehenkel, and D. Ernst. Optimal sample selection for batch-mode reinforcement learning. In 3rd International Conference on Agents and Artificial Intelligence (ICAART), 2011.
M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005), pages 317-328, Porto, Portugal, 2005.
R.S. Sutton and A.G. Barto. Reinforcement Learning. MIT Press, 1998.
S. Thrun. The role of exploration in learning control. In D. White and D. Sofge, editors, Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, 1992.