References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
See detailReinforcement Learning and Dynamic Programming using Function Approximators
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

Book published by CRC Press (2010)

Detailed reference viewed: 244 (29 ULg)
Full Text
Peer Reviewed
See detailA cautious approach to generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (2010, January)

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity ... [more ▼]

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. [less ▲]

Detailed reference viewed: 74 (22 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 7 (0 ULg)
Full Text
Peer Reviewed
See detailExploiting policy knowledge in online least-squares policy iteration: An empirical study
Busoniu, Lucian; Ernst, Damien ULg; Babusku, Robert et al

in Automation, Computers, Applied Mathematics (2010), 19(4), 521-529

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The field of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 34 (3 ULg)
Full Text
See detailComputing bounds for kernel-based policy evaluation in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

Report (2010)

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and ... [more ▼]

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework. [less ▲]

Detailed reference viewed: 13 (3 ULg)
Full Text
See detailVoronoi model learning for batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg

Report (2010)

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each ... [more ▼]

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system transition gathers a state, the action taken while being in this state, the immediate reward observed and the next state reached. In such a context, we propose a new model learning--type reinforcement learning (RL) algorithm in batch mode, finite-time and deterministic setting. The algorithm, named Voronoi reinforcement learning (VRL), approximates from a sample of system transitions the system dynamics and the reward function of the optimal control problem using piecewise constant functions on a Voronoi--like partition of the state-action space. [less ▲]

Detailed reference viewed: 20 (3 ULg)
Full Text
Peer Reviewed
See detailOnline least-squares policy iteration for reinforcement learning control
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the 2010 American Control Conference (2010)

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control ... [more ▼]

Reinforcement learning is a promising paradigm for learning optimal control. We consider policy iteration (PI) algorithms for reinforcement learning, which iteratively evaluate and improve control policies. State-of-the-art, least-squares techniques for policy evaluation are sample-efficient and have relaxed convergence requirements. However, they are typically used in offline PI, whereas a central goal of reinforcement learning is to develop online algorithms. Therefore, we propose an online PI algorithm that evaluates policies with the so-called least-squares temporal difference for Q-functions (LSTD-Q). The crucial difference between this online least-squares policy iteration (LSPI) algorithm and its offline counterpart is that, in the online case, policy improvements must be performed once every few state transitions, using only an incomplete evaluation of the current policy. In an extensive experimental evaluation, online LSPI is found to work well for a wide range of its parameters, and to learn successfully in a real-time example. Online LSPI also compares favorably with offline LSPI and with a different flavor of online PI, which instead of LSTD-Q employs another least-squares method for policy evaluation. [less ▲]

Detailed reference viewed: 24 (2 ULg)
Full Text
Peer Reviewed
See detailMulti-armed bandit based policies for cognitive radio's decision making issues
Jouini, Wassim; Ernst, Damien ULg; Moy, Christophe et al

in Proceedings of the 3rd International Conference on Signals, Circuits and Systems (SCS) (2009, November)

We suggest in this paper that many problems related to Cognitive Radio’s (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using ... [more ▼]

We suggest in this paper that many problems related to Cognitive Radio’s (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using Upper Confidence Bound (UCB) algorithms can lead to high-performance CR devices. An application of these algorithms to an academic Cognitive Radio problem is reported. [less ▲]

Detailed reference viewed: 55 (15 ULg)
Full Text
Peer Reviewed
See detailApoptosis characterizes immunological failure of HIV infected patients
Mhawej, Marie-José; Brunet-Francois, Cécile; Fonteneau, Raphaël ULg et al

in Control Engineering Practice (2009), 17(7), 798-804

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are ... [more ▼]

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are identified by using clinical data generated by monitoring patients starting Highly Active Anti-Retroviral Therapy (HAART). The sampling of blood tests is performed to satisfy the constraints of dynamical system parameter identification. The apoptosis parameter, which is inferred from clinical data, is then shown to play a key role in the early diagnosis of immunological failure. [less ▲]

Detailed reference viewed: 73 (18 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of one-step system transitions
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 28th Benelux Meeting on Systems and Control (2009)

Detailed reference viewed: 10 (4 ULg)
Full Text
Peer Reviewed
See detailEvaluation of network equivalents for voltage optimization in multi-area power systems
Phulpin, Yannick; Miroslav, Begovic; Petit, Marc et al

in IEEE Transactions on Power Systems (2009), 24(2), 729-743

The paper addresses the problem of decentralized optimization for a power system partitioned into several areas controlled by different transmission system operators (TSOs). The optimization variables are ... [more ▼]

The paper addresses the problem of decentralized optimization for a power system partitioned into several areas controlled by different transmission system operators (TSOs). The optimization variables are the settings for taps, generators’ voltages and compensators’, and the objective function is either based on the minimization of reactive power support, the minimization of active power losses, or a combination of both criteria. We suppose that each TSO assumes an external network equivalent for its neighboring areas and optimizes without concern for the neighboring systems’ objectives its own optimization function. We study, in the context where every TSO adopts the same type of objective function, the performance of an iterative scheme, where every TSO refreshes at each iteration the parameters of its external network equivalents depending on its past internal observations, solves its local optimization problem, and then, applies its “optimal actions” to the power system. In the context of voltage optimization, we find out that this decentralized control scheme can converge to nearly optimal global performance for relatively simple equivalents and simple procedures for fitting their parameters. [less ▲]

Detailed reference viewed: 68 (16 ULg)
Full Text
Peer Reviewed
See detailWhat is the likely future of real-time transient stability ?
Ernst, Damien ULg; Wehenkel, Louis ULg; Pavella, Mania ULg

in Proceedings of the 2009 IEEE/PES Power Systems Conference & Exposition (PSCE 2009) (2009)

Despite very intensive research efforts in the field of transient stability during the last five decades, the large majority of the derived techniques have hardly moved from the research laboratories to ... [more ▼]

Despite very intensive research efforts in the field of transient stability during the last five decades, the large majority of the derived techniques have hardly moved from the research laboratories to the industrial world and, as a matter of fact, the very large majority of today's control centers do not make use of any real-time transient stability software. On the other hand, along all these years the techniques developed for real-time transient stability have mainly focused on the definition of stability margins and speeding-up techniques rather than on preventive or emergency control strategies. In the light of the above observations, this paper attempts to explain the reasons for lack of industrial interest in real-time transient stability, and also to examine an even more fundamental question, namely: is transient stability, as has been stated many decades ago, still the relevant issue in the context of the new power systems morphology towards more dispersed generation, higher penetration of power electronics, larger and more complex structures, and, in addition, of economic and environmental constraints? Or, maybe, there is a need for techniques different from those developed so far? [less ▲]

Detailed reference viewed: 109 (14 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of trajectories
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this ... [more ▼]

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density. [less ▲]

Detailed reference viewed: 35 (10 ULg)
Full Text
Peer Reviewed
See detailPlanning under uncertainty, ensembles of disturbance trees and kernelized discrete action spaces
Defourny, Boris ULg; Ernst, Damien ULg; Wehenkel, Louis ULg

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

Optimizing decisions on an ensemble of incomplete disturbance trees and aggregating their first stage decisions has been shown as a promising approach to (model-based) planning under uncertainty in large ... [more ▼]

Optimizing decisions on an ensemble of incomplete disturbance trees and aggregating their first stage decisions has been shown as a promising approach to (model-based) planning under uncertainty in large continuous action spaces and in small discrete ones. The present paper extends this approach and deals with large but highly structured action spaces, through a kernel-based aggregation scheme. The technique is applied to a test problem with a discrete action space of 6561 elements adapted from the NIPS 2005 SensorNetwork benchmark. [less ▲]

Detailed reference viewed: 24 (8 ULg)
Full Text
Peer Reviewed
See detailPolicy search with cross-entropy optimization of basis functions
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc ... [more ▼]

This paper introduces a novel algorithm for approximate policy search in continuous-state, discrete-action Markov decision processes (MDPs). Previous policy search approaches have typically used ad-hoc parameterizations developed for specific MDPs. In contrast, the novel algorithm employs a flexible policy parameterization, suitable for solving general discrete-action MDPs. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions, where a discrete action is assigned to each basis function. The locations and shapes of the basis functions are optimized, together with the action assignments. This allows a large class of policies to be represented. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. We report simulation experiments in which the algorithm reliably obtains good policies with only a small number of basis functions, albeit at sizable computational costs. [less ▲]

Detailed reference viewed: 24 (9 ULg)
Full Text
Peer Reviewed
See detailA rare-event approach to build security analysis tools when N-k (k > 1) analyses are needed (as they are in large-scale power systems)
Belmudes, Florence ULg; Ernst, Damien ULg; Wehenkel, Louis ULg

in Proceedings of the 2009 IEEE Bucharest PowerTech (2009)

We consider the problem of performing N − k security analyses in large scale power systems. In such a context, the number of potentially dangerous N − k contingencies may become rapidly very large when k ... [more ▼]

We consider the problem of performing N − k security analyses in large scale power systems. In such a context, the number of potentially dangerous N − k contingencies may become rapidly very large when k grows, and so running a security analysis for each one of them is often intractable. We assume in this paper that the number of dangerous N − k contingencies is very small with respect to the number of non-dangerous ones. Under this assumption, we suggest to use importance sampling techniques for identifying rare events in combinatorial search spaces. With such techniques, it is possible to identify dangerous contingencies by running security analyses for only a small number of events. A procedure relying on these techniques is proposed in this work for steady-state security analyses. This procedure has been evaluated on the IEEE 118 bus test system. The results show that it is indeed able to efficiently identify among a large set of contingencies some of the rare ones which are dangerous. [less ▲]

Detailed reference viewed: 42 (5 ULg)
Full Text
Peer Reviewed
See detailA fair method for centralized optimization of multi-TSO power systems
Phulpin, Yannick; Begovic, Miroslav; Petit, Marc et al

in International Journal of Electrical Power & Energy Systems (2009), 31

This paper addresses the problem of centralized optimization of an interconnected power system partitioned into several regions controlled by different transmission system operators (TSOs). It is assumed ... [more ▼]

This paper addresses the problem of centralized optimization of an interconnected power system partitioned into several regions controlled by different transmission system operators (TSOs). It is assumed that those utilities have agreed to transferring some of their competencies to a centralized control center, which is in charge of setting the control variables in the entire system to satisfy every utility’s individual objective. This paper proposes an objective method for centralized optimization of such multi-TSO power systems, which relies on the assumption that each TSO has a real-valued optimization function focusing on its control area only. This method is illustrated on the IEEE 118 bus system partitioned into three TSOs. It is applied to the optimal reactive power dispatch problem, where the control variables are the voltage settings for generators and compensators. After showing that the method has some properties of fairness, namely freedom from envy, efficiency, accountability, and altruism, we emphasize its robustness with respect to certain biased behavior of the different TSOs. [less ▲]

Detailed reference viewed: 152 (13 ULg)
Full Text
Peer Reviewed
See detailApprentissage par renforcement appliqué à la commande des systèmes électriques
Dai, Jing; Phulpin, Yannick; Vannier, Jean-Claude et al

in Proceedings of "Les Journées Electrotechnique du Futur 2009" (2009)

Cet article propose une revue de littérature concernant les applications de l’apprentissage par renforcement à la commande des systèmes électriques. L'apprentissage par renforcement a pour caractéristique ... [more ▼]

Cet article propose une revue de littérature concernant les applications de l’apprentissage par renforcement à la commande des systèmes électriques. L'apprentissage par renforcement a pour caractéristique principale de résoudre des problèmes de commande optimale à partir de la seule observation des trajectoires du système. Il présente l’intérêt de ne pas requérir de connaissance à priori sur la dynamique du système à commander et convient ainsi aux problèmes de commande des systèmes complexes. Dans un premier temps, l’article détaille les caractéristiques des problèmes auxquels l’apprentissage par renforcement s’applique, puis cette technique est décrite. Ensuite, deux exemples classiques d’application aux systèmes électriques sont présentés. [less ▲]

Detailed reference viewed: 45 (2 ULg)
Full Text
Peer Reviewed
See detailBounds for Multistage Stochastic Programs using Supervised Learning Strategies
Defourny, Boris ULg; Ernst, Damien ULg; Wehenkel, Louis ULg

in Watanabe, Osamu; Zeugmann, Thomas (Eds.) Stochastic Algorithms: Foundations and Applications (2009)

We propose a generic method for obtaining quickly good upper bounds on the minimal value of a multistage stochastic program. The method is based on the simulation of a feasible decision policy ... [more ▼]

We propose a generic method for obtaining quickly good upper bounds on the minimal value of a multistage stochastic program. The method is based on the simulation of a feasible decision policy, synthesized by a strategy relying on any scenario tree approximation from stochastic programming and on supervised learning techniques from machine learning. [less ▲]

Detailed reference viewed: 30 (17 ULg)