Reference : Exploiting policy knowledge in online least-squares policy iteration: An empirical study
Scientific journals : Article
Engineering, computing & technology : Computer science
Exploiting policy knowledge in online least-squares policy iteration: An empirical study
Busoniu, Lucian [> >]
Ernst, Damien mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
Babusku, Robert [> >]
De Schutter, Bart [> >]
Automation, Computers, Applied Mathematics
Yes (verified by ORBi)
[en] reinforcement learning ; prior knowledge ; least-squares policy iteration ; online learning
[en] Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The field of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI.
Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS
Researchers ; Professionals ; Students

File(s) associated to this reference

Fulltext file(s):

Open access
acam10.pdfThis paper is a slightly extended version of the paper "Using prior knowledge to accelerate online least-squares policy iteration"Publisher postprint733.95 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.