Paper published in a book (Scientific congresses and symposiums)
Simultaneous perturbation algorithms for batch off-policy search
Fonteneau, Raphaël; Prashanth L.A.
2014In Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)
Peer reviewed
 

Files


Full Text
MCGrad_CDC_CameraReady.pdf
Author preprint (374.1 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that in [Fonteneau et al. (2010)]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized policies. We propose both first order policy gradient and second order policy Newton algorithms. All our algorithms incorporate simultaneous perturbation estimates for the gradient as well as the Hessian of the cost-to-go vector, since the latter is unknown and only biased estimates are available. We demonstrate their practicality on a simple 1-dimensional continuous state space problem.
Disciplines :
Computer science
Author, co-author :
Fonteneau, Raphaël ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Prashanth L.A.
Language :
English
Title :
Simultaneous perturbation algorithms for batch off-policy search
Publication date :
2014
Event name :
53rd IEEE Conference on Decision and Control (IEEE CDC 2014)
Event date :
December 15-17, 2014
Audience :
International
Main work title :
Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)
Peer reviewed :
Peer reviewed
Available on ORBi :
since 14 October 2014

Statistics


Number of views
50 (5 by ULiège)
Number of downloads
130 (1 by ULiège)

Scopus citations®
 
1
Scopus citations®
without self-citations
1

Bibliography


Similar publications



Contact ORBi