Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning

Maes, Francis; Fonteneau, Raphaël; Wehenkel, Louis; Ernst, Damien

doi:10.1007/978-3-642-33492-4_6

Download

Paper published in a book (Scientific congresses and symposiums)

Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning

Maes, Francis; Fonteneau, Raphaël; Wehenkel, Louis et al.

2012 • In Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings

Peer reviewed

Permalink
https://hdl.handle.net/2268/135635

DOI
10.1007/978-3-642-33492-4_6

Files (2)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Maes2012Ds.pdf

Publisher postprint (221.45 kB)

Download

Annexes

Maes2012DSSlides.pdf

Publisher postprint (4.88 MB)

This is a fun comic-book like presentation of this paper !

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement Learning; Formula Discovery; Interpretability

Abstract :

[en] In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efﬁcient and interpretable solutions.

Disciplines :

Computer science

Author, co-author :

Maes, Francis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning

Publication date :

October 2012

Event name :

The Fifteenth International Conference on Discovery Science (DS 2012)

Event date :

29-31 October, 2012

Audience :

International

Main work title :

Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings

Publisher :

Springer, Berlin, Germany

ISBN/EAN :

978-3-642-33491-7

Pages :

37-51

Peer reviewed :

Peer reviewed

Available on ORBi :

since 30 November 2012

Statistics

Number of views

101 (14 by ULiège)

Number of downloads

471 (7 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

Adams, B., Banks, H., Kwon, H.D., Tran, H.: Dynamic multidrug therapies for HIV: Optimal and STI approaches. Mathematical Biosciences and Engineering 1, 223-241 (2004)
Audibert, J.-Y., Munos, R., Szepesvari, C.: Tuning Bandit Algorithms in Stochastic Environments. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 150-165. Springer, Heidelberg (2007)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235-256 (2002)
Bar-Gad, I., Morris, G., Bergman, H.: Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Progress in Neurobiology 71(6), 439-473 (2003)
Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37(4), 1034-1054 (1991)
Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis, CRC Press (2010)
Castelletti, A., Galelli, S., Restelli, M., Soncini-Sessa, R.: Tree-based variable selection for dimensionality reduction of large-scale control systems. In: Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 62-69. IEEE (2011)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503-556 (2005)
Fonteneau, R., Wehenkel, L., Ernst, D.: Variable selection for dynamic treatment regimes: a reinforcement learning approach. In: European Workshop on Reinforcement Learning, EWRL (2008)
Gearhart, C.: Genetic programming as policy search in markov decision processes. In: Genetic Algorithms and Genetic Programming at Stanford, pp. 61-67 (2003)
Girgin, S., Preux, P.: Feature Discovery in Reinforcement Learning Using Genetic Programming. In: O'Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcazar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 218-229. Springer, Heidelberg (2008)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley (1989)
Guez, A., Vincent, R., Avoli, M., Pineau, J.: Adaptive treatment of epilepsy via batch-mode reinforcement learning. In: Innovative Applications of Artificial Intelligence (IAAI), pp. 1671-1678 (2008)
Gunter, L., Zhu, J., Murphy, S.: Variable Selection for Optimal Decision Making. In: Bellazzi, R., Abu-Hanna, A., Hunter, J. (eds.) AIME 2007. LNCS (LNAI), vol. 4594, pp. 149-154. Springer, Heidelberg (2007)
Hren, J.-F., Munos, R.: Optimistic Planning of Deterministic Systems. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 151-164. Springer, Heidelberg (2008)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
Ingersoll, J.: Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc. (1987)
Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1(1), 1-7 (1965)
Maes, F., Wehenkel, L., Ernst, D.: Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 5-17. Springer, Heidelberg (2012)
Maes, F., Wehenkel, L., Ernst, D.: Optimized Look-ahead Tree Search Policies. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 189-200. Springer, Heidelberg (2012)
Moore, A., Atkeson, C.: The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning 21(3), 199-233 (1995)
Murphy, S.: Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B 65(2), 331-366 (2003)
Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 463-471. Citeseer (1998)
Riedmiller, M.: Neural Fitted Q Iteration-First Experiences with a Data Efficient Neural Reinforcement Learning Method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317-328. Springer, Heidelberg (2005)
Rubinstein, R., Kroese, D.: The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Information Science and Statistics. Springer (2004)
Ruping, S.: Learning Interpretable Models. Ph.D. thesis (2006)
Stanley, K., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2), 99-127 (2002)
Wehenkel, L.: Automatic Learning Techniques in Power Systems. Kluwer Academic Publishers, Boston (1998)
Yoshimoto, J., Ishii, S., Sato, M.: Application of reinforcement learning to balancing of acrobot. In: Systems, Man, and Cybernetics Conference Proceedings, vol. 5, pp. 516-521. IEEE (1999)
Zhao, Y., Kosorok, M., Zeng, D.: Reinforcement learning design for cancer clinical trials. Statistics in Medicine 28, 3294-3315 (2009)