Reference : Inferring bounds on the performance of a control policy from a sample of trajectories
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/13667
Inferring bounds on the performance of a control policy from a sample of trajectories
English
Fonteneau, Raphaël mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Murphy, Susan [ > > ]
Wehenkel, Louis mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
2009
Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09)
117-123
Yes
No
International
978-1-4244-2761-1
IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09)
March 30 - April 2, 2009
Nashville
USA
[en] reinforcement learning ; model-free ; lower bound on a policy
[fr] performance guarantee
[en] We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density.
Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS
http://hdl.handle.net/2268/13667
10.1109/ADPRL.2009.4927534

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
bounds-trajectories-adprl.pdfPublisher postprint249.44 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
RL-TU-Delft-2009.pdf292.91 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.