References of "De Schutter, Bart"
     in
Bookmark and Share    
Full Text
See detailApproximate reinforcement learning: an overview
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

in Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)

Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in ... [more ▼]

Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. [less ▲]

Detailed reference viewed: 81 (3 ULg)
Full Text
See detailCross-entropy optimization of control policies with adaptive basis functions
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics (2011), 41(1), 196-209

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be ... [more ▼]

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for crossentropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [less ▲]

Detailed reference viewed: 15 (2 ULg)
Full Text
See detailUsing prior knowledge to accelerate online least-squares policy iteration
Busoniu, Lucian; De Schutter, Bart; Babuska, Robert et al

in Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (2010, May)

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 17 (3 ULg)
Full Text
See detailApproximate dynamic programming with a fuzzy parameterization
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Automatica (2010), 46(5), 804-814

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values ... [more ▼]

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. [less ▲]

Detailed reference viewed: 36 (12 ULg)
Full Text
See detailReinforcement Learning and Dynamic Programming using Function Approximators
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

Book published by CRC Press (2010)

Detailed reference viewed: 177 (21 ULg)
Full Text
See detailExploiting policy knowledge in online least-squares policy iteration: An empirical study
Busoniu, Lucian; Ernst, Damien ULg; Babusku, Robert et al

in Automation, Computers, Applied Mathematics (2010), 19(4), 521-529

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The field of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 33 (3 ULg)
Full Text
See detailOnline least-squares policy iteration for reinforcement learning control
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the 2010 American Control Conference (2010)

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control ... [more ▼]

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q). The crucial difference between this online least-squares policy iteration (LSPI) algorithm and its offline counterpart is that, in the online case, policy improvements must be performed once every few state transitions, using only an incomplete evaluation of the current policy. In an extensive experimental evaluation, online LSPI is found to work well for a wide range of its parameters, and to learn successfully in a real-time example. Online LSPI also compares favorably with offline LSPI and with a different flavor of online PI, which instead of LSTD-Q employs another least-squares method for policy evaluation. [less ▲]

Detailed reference viewed: 23 (2 ULg)
Full Text
See detailPolicy search with cross-entropy optimization of basis functions
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc ... [more ▼]

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs. In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function. The locations and shapes of the basis functions are optimized, together with the action assignments. This allows a large class of policies to be represented. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs. [less ▲]

Detailed reference viewed: 23 (9 ULg)
Full Text
See detailFuzzy partition optimization for approximate fuzzy Q-iteration
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in Proceedings of the 17th IFAC World Congress (IFAC-08) (2008)

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice ... [more ▼]

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorithms for approximate RL rely on basis-function representations of the value function (or of the Q-function). Designing a good set of basis functions without any prior knowledge of the value function (or of the Q-function) can be a difficult task. In this paper, we propose instead a technique to optimize the shape of a constant number of basis functions for the approximate, fuzzy Q-iteration algorithm. In contrast to other approaches to adapt basis functions for RL, our optimization criterion measures the actual performance of the computed policies in the task, using simulation from a representative set of initial states. A complete algorithm, using cross-entropy optimization of triangular fuzzy membership functions, is given and applied to the car-on-the-hill example. [less ▲]

Detailed reference viewed: 62 (2 ULg)
Full Text
See detailConsistency of fuzzy model-based reinforcement learning
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in Proceedings of the 2008 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE-08) (2008)

Reinforcement learning (RL) is a widely used paradigm for learning control. Computing exact RL solutions is generally only possible when process states and control actions take values in a small discrete ... [more ▼]

Reinforcement learning (RL) is a widely used paradigm for learning control. Computing exact RL solutions is generally only possible when process states and control actions take values in a small discrete set. In practice, approximate algorithms are necessary. In this paper, we propose an approximate, model-based Q-iteration algorithm that relies on a fuzzy partition of the state space, and a discretization of the action space. Using assumptions on the continuity of the dynamics and of the reward function, we show that the resulting algorithm is consistent, i.e., that the optimal solution is obtained asymptotically as the approximation accuracy increases. An experimental study indicates that a continuous reward function is also important for a predictable improvement in performance as the approximation accuracy increases. [less ▲]

Detailed reference viewed: 17 (3 ULg)
Full Text
See detailContinuous-state reinforcement learning with fuzzy approximation
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Tuyls, K.; Nowé, A.; Guessoum, Z. (Eds.) et al Adaptive Agents and Multi-Agent Systems III, Adaptation and Multi-Agent Learning (2008)

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form ... [more ▼]

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more difficult case where the state-action space is continuous. In this work, we propose a fuzzy approximation architecture similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We prove that the resulting algorithm converges. We also give a modified, asynchronous variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided. [less ▲]

Detailed reference viewed: 39 (7 ULg)
Full Text
See detailFuzzy approximation for convergent model-based reinforcement learning
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in Proceedings of the 2007 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE-07) (2007)

Reinforcement learning (RL) is a learning control paradigm that provides well-understood algorithms with good convergence and consistency properties. Unfortunately, these algorithms require that process ... [more ▼]

Reinforcement learning (RL) is a learning control paradigm that provides well-understood algorithms with good convergence and consistency properties. Unfortunately, these algorithms require that process states and control actions take only discrete values. Approximate solutions using fuzzy representations have been proposed in the literature for the case when the states and possibly the actions are continuous. However, the link between these mainly heuristic solutions and the larger body of work on approximate RL, including convergence results, has not been made explicit. In this paper, we propose a fuzzy approximation structure for the Q-value iteration algorithm, and show that the resulting algorithm is convergent. The proof is based on an extension of previous results in approximate RL. We then propose a modi ed, serial version of the algorithm that is guaranteed to converge at least as fast as the original algorithm. An illustrative simulation example is also provided. [less ▲]

Detailed reference viewed: 14 (1 ULg)
Full Text
See detailContinuous-state reinforcement learning with fuzzy approximation
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in Proceedings of the 7th European Symposium on Adaptive Learning Agents and Multi-Agent Systems (ALAMAS-07) (2007)

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these ... [more ▼]

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Well-understood RL algorithms with good convergence and consistency properties exist. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed in the literature for the more difficult case where the state-action space is continuous. In this work, we propose a fuzzy approximation structure similar to those previously used for Q-learning, but we combine it with the model-based Q-value iteration algorithm. We show that the resulting algorithm converges. We also give a modif ed, serial variant of the algorithm that converges at least as fast as the original version. An illustrative simulation example is provided. [less ▲]

Detailed reference viewed: 12 (3 ULg)