References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailUsing prior knowledge to accelerate online least-squares policy iteration
Busoniu, Lucian; De Schutter, Bart; Babuska, Robert et al

in Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (2010, May)

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 23 (3 ULg)
Full Text
Peer Reviewed
See detailApproximate dynamic programming with a fuzzy parameterization
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Automatica (2010), 46(5), 804-814

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values ... [more ▼]

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. [less ▲]

Detailed reference viewed: 43 (12 ULg)
Full Text
See detailReinforcement Learning and Dynamic Programming using Function Approximators
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

Book published by CRC Press (2010)

Detailed reference viewed: 269 (31 ULg)
Full Text
Peer Reviewed
See detailA cautious approach to generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (2010, January)

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity ... [more ▼]

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. [less ▲]

Detailed reference viewed: 103 (24 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 12 (1 ULg)
Full Text
Peer Reviewed
See detailExploiting policy knowledge in online least-squares policy iteration: An empirical study
Busoniu, Lucian; Ernst, Damien ULg; Babusku, Robert et al

in Automation, Computers, Applied Mathematics (2010), 19(4), 521-529

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The field of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 56 (4 ULg)
Full Text
See detailComputing bounds for kernel-based policy evaluation in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

Report (2010)

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and ... [more ▼]

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework. [less ▲]

Detailed reference viewed: 16 (3 ULg)
Full Text
See detailVoronoi model learning for batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg

Report (2010)

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each ... [more ▼]

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system transition gathers a state, the action taken while being in this state, the immediate reward observed and the next state reached. In such a context, we propose a new model learning--type reinforcement learning (RL) algorithm in batch mode, finite-time and deterministic setting. The algorithm, named Voronoi reinforcement learning (VRL), approximates from a sample of system transitions the system dynamics and the reward function of the optimal control problem using piecewise constant functions on a Voronoi--like partition of the state-action space. [less ▲]

Detailed reference viewed: 29 (4 ULg)
Full Text
Peer Reviewed
See detailOnline least-squares policy iteration for reinforcement learning control
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the 2010 American Control Conference (2010)

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control ... [more ▼]

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q). The crucial difference between this online least-squares policy iteration (LSPI) algorithm and its offline counterpart is that, in the online case, policy improvements must be performed once every few state transitions, using only an incomplete evaluation of the current policy. In an extensive experimental evaluation, online LSPI is found to work well for a wide range of its parameters, and to learn successfully in a real-time example. Online LSPI also compares favorably with offline LSPI and with a different flavor of online PI, which instead of LSTD-Q employs another least-squares method for policy evaluation. [less ▲]

Detailed reference viewed: 28 (2 ULg)
Full Text
Peer Reviewed
See detailMulti-armed bandit based policies for cognitive radio's decision making issues
Jouini, Wassim; Ernst, Damien ULg; Moy, Christophe et al

in Proceedings of the 3rd International Conference on Signals, Circuits and Systems (SCS) (2009, November)

We suggest in this paper that many problems related to Cognitive Radio’s (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using ... [more ▼]

We suggest in this paper that many problems related to Cognitive Radio’s (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using Upper Confidence Bound (UCB) algorithms can lead to high-performance CR devices. An application of these algorithms to an academic Cognitive Radio problem is reported. [less ▲]

Detailed reference viewed: 68 (15 ULg)
Full Text
Peer Reviewed
See detailApoptosis characterizes immunological failure of HIV infected patients
Mhawej, Marie-José; Brunet-Francois, Cécile; Fonteneau, Raphaël ULg et al

in Control Engineering Practice (2009), 17(7), 798-804

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are ... [more ▼]

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are identified by using clinical data generated by monitoring patients starting Highly Active Anti-Retroviral Therapy (HAART). The sampling of blood tests is performed to satisfy the constraints of dynamical system parameter identification. The apoptosis parameter, which is inferred from clinical data, is then shown to play a key role in the early diagnosis of immunological failure. [less ▲]

Detailed reference viewed: 100 (19 ULg)
Full Text
See detailWhat is the likely future of real-time transient stability ?
Ernst, Damien ULg

Speech/Talk (2009)

Detailed reference viewed: 5 (2 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of one-step system transitions
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 28th Benelux Meeting on Systems and Control (2009)

Detailed reference viewed: 12 (4 ULg)
Full Text
Peer Reviewed
See detailEvaluation of network equivalents for voltage optimization in multi-area power systems
Phulpin, Yannick; Miroslav, Begovic; Petit, Marc et al

in IEEE Transactions on Power Systems (2009), 24(2), 729-743

The paper addresses the problem of decentralized optimization for a power system partitioned into several areas controlled by different transmission system operators (TSOs). The optimization variables are ... [more ▼]

The paper addresses the problem of decentralized optimization for a power system partitioned into several areas controlled by different transmission system operators (TSOs). The optimization variables are the settings for taps, generators’ voltages and compensators’, and the objective function is either based on the minimization of reactive power support, the minimization of active power losses, or a combination of both criteria. We suppose that each TSO assumes an external network equivalent for its neighboring areas and optimizes without concern for the neighboring systems’ objectives its own optimization function. We study, in the context where every TSO adopts the same type of objective function, the performance of an iterative scheme, where every TSO refreshes at each iteration the parameters of its external network equivalents depending on its past internal observations, solves its local optimization problem, and then, applies its “optimal actions” to the power system. In the context of voltage optimization, we find out that this decentralized control scheme can converge to nearly optimal global performance for relatively simple equivalents and simple procedures for fitting their parameters. [less ▲]

Detailed reference viewed: 69 (16 ULg)
Full Text
Peer Reviewed
See detailWhat is the likely future of real-time transient stability ?
Ernst, Damien ULg; Wehenkel, Louis ULg; Pavella, Mania ULg

in Proceedings of the 2009 IEEE/PES Power Systems Conference & Exposition (PSCE 2009) (2009)

Despite very intensive research efforts in the field of transient stability during the last five decades, the large majority of the derived techniques have hardly moved from the research laboratories to ... [more ▼]

Despite very intensive research efforts in the field of transient stability during the last five decades, the large majority of the derived techniques have hardly moved from the research laboratories to the industrial world and, as a matter of fact, the very large majority of today's control centers do not make use of any real-time transient stability software. On the other hand, along all these years the techniques developed for real-time transient stability have mainly focused on the definition of stability margins and speeding-up techniques rather than on preventive or emergency control strategies. In the light of the above observations, this paper attempts to explain the reasons for lack of industrial interest in real-time transient stability, and also to examine an even more fundamental question, namely: is transient stability, as has been stated many decades ago, still the relevant issue in the context of the new power systems morphology towards more dispersed generation, higher penetration of power electronics, larger and more complex structures, and, in addition, of economic and environmental constraints? Or, maybe, there is a need for techniques different from those developed so far? [less ▲]

Detailed reference viewed: 124 (16 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of trajectories
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this ... [more ▼]

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density. [less ▲]

Detailed reference viewed: 36 (10 ULg)
Full Text
Peer Reviewed
See detailPlanning under uncertainty, ensembles of disturbance trees and kernelized discrete action spaces
Defourny, Boris ULg; Ernst, Damien ULg; Wehenkel, Louis ULg

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

Optimizing decisions on an ensemble of incomplete disturbance trees and aggregating their first stage decisions has been shown as a promising approach to (model-based) planning under uncertainty in large ... [more ▼]

Optimizing decisions on an ensemble of incomplete disturbance trees and aggregating their first stage decisions has been shown as a promising approach to (model-based) planning under uncertainty in large continuous action spaces and in small discrete ones. The present paper extends this approach and deals with large but highly structured action spaces, through a kernel-based aggregation scheme. The technique is applied to a test problem with a discrete action space of 6561 elements adapted from the NIPS 2005 SensorNetwork benchmark. [less ▲]

Detailed reference viewed: 27 (9 ULg)
Full Text
Peer Reviewed
See detailPolicy search with cross-entropy optimization of basis functions
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc ... [more ▼]

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs. In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function. The locations and shapes of the basis functions are optimized, together with the action assignments. This allows a large class of policies to be represented. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs. [less ▲]

Detailed reference viewed: 28 (9 ULg)