Wehenkel, Louis[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
May-2010
Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
No
International
Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
May 16, 2010
Chia Laguna, Sardinia
Italy
[en] reinforcement learning ; optimal control ; sampling strategies
[en] We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories.
Fonds pour la formation à la Recherche dans l'Industrie et dans l'Agriculture (Communauté française de Belgique) - FRIA ; Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS
This paper together with the three papers "Model-free Monte Carlo-like policy evaluation", "Inferring bounds on the performance of a control policy from a sample of trajectories" and "A cautious approach to generalization in reinforcement learning" represent a body of work in batch-mode RL which is based on the rebuilding of trajectories. This file is a presentation of this body of work.