Reference : Generating informative trajectories by using bounds on the return of control policies
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/36015
Generating informative trajectories by using bounds on the return of control policies
English
Fonteneau, Raphaël mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Murphy, Susan [ > > ]
Wehenkel, Louis mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
May-2010
Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
Yes
No
International
Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
May 16, 2010
Chia Laguna, Sardinia
Italy
[en] reinforcement learning ; optimal control ; sampling strategies
[en] We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories.
Fonds pour la formation à la Recherche dans l'Industrie et dans l'Agriculture (Communauté française de Belgique) - FRIA ; Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS
Researchers ; Professionals
http://hdl.handle.net/2268/36015

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
Fonteneau2010ALED.pdfPublisher postprint130.65 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
NIPS-2010-talk.pdfThis paper together with the three papers "Model-free Monte Carlo-like policy evaluation", "Inferring bounds on the performance of a control policy from a sample of trajectories" and "A cautious approach to generalization in reinforcement learning" represent a body of work in batch-mode RL which is based on the rebuilding of trajectories. This file is a presentation of this body of work.463.17 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.