Approximate dynamic programming with a fuzzy parameterization

[en] Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem.

Disciplines :

Computer science

Author, co-author :

Busoniu, Lucian

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

De Schutter, Bart

Robert, Babuska

Language :

English

Title :

Approximate dynamic programming with a fuzzy parameterization

Publication date :

May 2010

Journal title :

Automatica

ISSN :

0005-1098

Publisher :

Pergamon Press - An Imprint of Elsevier Science, Oxford, United Kingdom

Volume :

Issue :

Pages :

804-814

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://www.montefiore.ulg.ac.be/~ernst

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]

Available on ORBi :

since 16 April 2010

Statistics

Number of views

108 (12 by ULiège)

Number of downloads

298 (3 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

Antos A., Munos R., and Szepesvári Cs. Fitted Q-iteration in continuous action-space MDPs. In: Platt J.C., Koller D., Singer Y., and Roweis S.T. (Eds). Advances in neural information processing systems: Vol. 20 (2008), MIT Press 9-16
Berenji H.R., and Vengerov D. A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11 4 (2003) 478-485
Bertsekas D.P. Dynamic programming and optimal control, Vol. 2. 3rd ed. (2007), Athena Scientific
Bertsekas D.P., and Tsitsiklis J.N. Neuro-dynamic programming (1996), Athena Scientific
Brown M., and Harris C. Neurofuzzy adaptive modeling and control (1994), Prentice Hall
Buşoniu, L., De Schutter, B., & Babuška, R. (2006). Decentralized reinforcement learning control of a robotic manipulator. In Proceedings 9th international conference of control, automation, robotics, and vision, ICARCV-06 (pp. 1347-1352). Singapore, 5-8 December.
Buşoniu, L., Ernst, D., De Schutter, B., & Babuška, R. (2007). Fuzzy approximation for convergent model-based reinforcement learning. In Proceedings 2007 IEEE international conference on fuzzy systems, FUZZ-IEEE-07 (pp. 968-973). London, UK, 23-26 July.
Buşoniu, L., Ernst, D., De Schutter, B., & Babuška, R. (2008a). Consistency of fuzzy model-based reinforcement learning. In Proceedings 2008 IEEE international conference on fuzzy systems, FUZZ-IEEE-08 (pp. 518-524). Hong Kong, 1-6 June.
Buşoniu L., Ernst D., De Schutter B., and Babuška R. Continuous-state reinforcement learning with fuzzy approximation. In: Tuyls I.K., Nowé A., Guessoum Z., and Kudenko D. (Eds). Adaptive agents and multi-agent systems III. Lecture notes in computer science Vol. 4865 (2008), Springer 27-43
Chow C.-S., and Tsitsiklis J.N. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36 8 (1991) 898-914
Ernst D., Geurts P., and Wehenkel L. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6 (2005) 503-556
Farahmand, A. M., Ghavamzadeh, M., Szepesvári, Cs., & Mannor, S. (2009). Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems. In Proceedings 2009 American control conference, ACC-09(pp. 725-730). St. Louis, US, 10-12 June.
Glorennec, P. Y. (2000). Reinforcement learning: An overview. In Proceedings European symposium on intelligent techniques, ESIT-00 (pp. 17-35). Aachen, Germany, 14-15 September.
Gordon, G. (1995). Stable function approximation in dynamic programming. In Proceedings 12th international conference on machine learning, ICML-95(pp. 261-268). Tahoe City, US, 9-12 July.
Horiuchi, T., Fujino, A., Katai, O., & Sawaragi, T. (1996). Fuzzy interpolation-based Q-learning with continuous states and actions. In Proceedings 5th IEEE international conference on fuzzy systems, FUZZ-IEEE-96 (pp. 594-600). New Orleans, US, 8-11 September.
Istratescu V.I. Fixed point theory: An introduction (2002), Springer
Jouffe L. Fuzzy inference system learning by reinforcement methods. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 28 3 (1998) 338-355
Kruse R., Gebhardt J.E., and Klowon F. Foundations of fuzzy systems (1994), Wiley
Lagoudakis M.G., and Parr R. Least-squares policy iteration. Journal of Machine Learning Research 4 (2003) 1107-1149
Lin C.-K. A reinforcement learning adaptive fuzzy controller for robots. Fuzzy Sets and Systems 137 3 (2003) 339-352
Munos R., and Moore A. Variable-resolution discretization in optimal control. Machine Learning 49 2-3 (2002) 291-323
Munos R., and Szepesvári Cs. Finite time bounds for fitted value iteration. Journal of Machine Learning Research 9 (2008) 815-857
Santos M.S., and Vigo-Aguiar J. Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66 2 (1998) 409-426
Sutton R.S., and Barto A.G. Reinforcement learning: An introduction (1998), MIT Press
Szepesvári, Cs., & Smart, W. D. (2004). Interpolation-based Q-learning. In Proceedings 21st international conference on machine learning, ICML-04(pp. 791-798). Bannf, Canada, 4-8 July.
Tsitsiklis J.N., and Van Roy B. Feature-based methods for large scale dynamic programming. Machine Learning 22 1-3 (1996) 59-94