Discovery Science. 5th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings.
[en] Reinforcement Learning ; Formula Discovery ; Interpretability
[en] In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efﬁcient and interpretable solutions.