Exploiting policy knowledge in online least-squares policy iteration: An empirical study

Busoniu, Lucian; Ernst, Damien; Babusku, Robert; De Schutter, Bart

Download

Article (Scientific journals)

Exploiting policy knowledge in online least-squares policy iteration: An empirical study

Busoniu, Lucian; Ernst, Damien; Babusku, Robert et al.

2010 • In Automation, Computers, Applied Mathematics, 19 (4), p. 521-529

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/106014

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

acam10.pdf

Publisher postprint (751.57 kB)

This paper is a slightly extended version of the paper "Using prior knowledge to accelerate online least-squares policy iteration"

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

reinforcement learning; prior knowledge; least-squares policy iteration; online learning

Abstract :

[en] Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The ﬁeld of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI.

Disciplines :

Computer science

Author, co-author :

Busoniu, Lucian

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Babusku, Robert

De Schutter, Bart

Language :

English

Title :

Exploiting policy knowledge in online least-squares policy iteration: An empirical study

Publication date :

2010

Journal title :

Automation, Computers, Applied Mathematics

ISSN :

1221-437X

Publisher :

Technical University of Cluj-Napoca. Faculty of Automation and Computer Science, Cluj-Napoca, Romania

Volume :

Issue :

Pages :

521-529

Peer reviewed :

Peer Reviewed verified by ORBi

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]

Available on ORBi :

since 22 December 2011

Statistics

Number of views

140 (4 by ULiège)

Number of downloads

118 (1 by ULiège)

More statistics