References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailCross-entropy optimization of control policies with adaptive basis functions
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics (2011), 41(1), 196-209

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be ... [more ▼]

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for crossentropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [less ▲]

Detailed reference viewed: 24 (2 ULg)
Full Text
Peer Reviewed
See detailTowards min max generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Filipe, Joaquim; Fred, Ana; Sharp, Bernadette (Eds.) Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers (2011)

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst ... [more ▼]

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the particular case of deterministic Lipschitz continuous environments over continuous state spaces, nite action spaces, and a nite optimization horizon. We discuss the non-triviality of computing an exact solution of the min max problem even after reformulating it so as to avoid search in function spaces. For addressing this problem, we propose to replace, inside this min max problem, the search for the worst environment given a sequence of actions by an expression that lower bounds the worst return that can be obtained for a given sequence of actions. This lower bound has a tightness that depends on the sample sparsity. From there, we propose an algorithm of polynomial complexity that returns a sequence of actions leading to the maximization of this lower bound. We give a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. Our experiments show that this algorithm can lead to more cautious policies than algorithms combining dynamic programming with function approximators. [less ▲]

Detailed reference viewed: 27 (4 ULg)
Full Text
Peer Reviewed
See detailAutomatic discovery of ranking formulas for playing with multi-armed bandits
Maes, Francis ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in Proceedings of the 9th European Workshop on Reinforcement Learning (EWRL 2011) (2011)

We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for ... [more ▼]

We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for example addition, subtraction, the max operator, the average values of rewards collected by an arm, their standard deviation etc., and by exploiting this grammar to generate and test a large number of formulas. The systematic search for good candidate formulas is carried out by a built-on-purpose optimization algorithm used to navigate inside this large set of candidate formulas towards those that give high performances when using them on some multi-armed bandit problems. We have applied this approach on a set of bandit problems made of Bernoulli, Gaussian and truncated Gaussian distributions and have identi ed a few simple ranking formulas that provide interesting results on every problem of this set. In particular, they clearly outperform several reference policies previously introduced in the literature. We argue that these newly found formulas as well as the procedure for generating them may suggest new directions for studying bandit problems. [less ▲]

Detailed reference viewed: 51 (18 ULg)
Full Text
Peer Reviewed
See detailOptimized look-ahead tree policies
Maes, Francis ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in Proceedings of the 9th European Workshop on Reinforcement Learning (EWRL 2011) (2011)

We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an in finite time horizon. Given ... [more ▼]

We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an in finite time horizon. Given the current system state xt at time t, these techniques explore the look-ahead tree representing possible evolutions of the system states and rewards conditioned on subsequent actions ut, ut+1, ... . When the computing budget is exhausted, they output the action ut that led to the best found sequence of discounted rewards. In this context, we are interested in computing good strategies for exploring the look-ahead tree. We propose a generic approach that looks for such strategies by solving an optimization problem whose objective is to compute a (budget compliant) tree-exploration strategy yielding a control policy maximizing the average return over a postulated set of initial states. This generic approach is fully speci ed to the case where the space of candidate tree-exploration strategies are "best-first" strategies parameterized by a linear combination of look-ahead path features - some of them having been advocated in the literature before - and where the optimization problem is solved by using an EDA-algorithm based on Gaussian distributions. Numerical experiments carried out on a model of the treatment of the HIV infection show that the optimized tree-exploration strategy is orders of magnitudes better than the previously advocated ones. [less ▲]

Detailed reference viewed: 60 (10 ULg)
Full Text
Peer Reviewed
See detailOptimal sample selection for batch-mode reinforcement learning
Rachelson, Emmanuel ULg; Schnitzler, François ULg; Wehenkel, Louis ULg et al

in Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011) (2011)

We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of finding a near-optimal closed-loop policy to the ... [more ▼]

We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of finding a near-optimal closed-loop policy to the identification of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular instance of this OSS metaalgorithm that uses tree-based Fitted Q-Iteration as a batch-mode RL algorithm and Cross Entropy search as a method for navigating efficiently in the space of sample sets. The results show that this particular instance of OSS algorithms is able to identify rapidly small sample sets leading to high-quality policies [less ▲]

Detailed reference viewed: 84 (13 ULg)
Full Text
Peer Reviewed
See detailVoltage control in an HVDC system to share primary frequency reserves between non-synchronous areas
Jing, Dai; Phulpin, Yannick; Sarlette, Alain ULg et al

in Proceedings of the 17th Power Systems Computation Conference (PSCC-11) (2011)

This paper addresses the problem of frequency control for non-synchronous AC areas connected by a multi-terminal HVDC grid. It proposes a decentralized control scheme for the DC voltages of the HVDC ... [more ▼]

This paper addresses the problem of frequency control for non-synchronous AC areas connected by a multi-terminal HVDC grid. It proposes a decentralized control scheme for the DC voltages of the HVDC converters aimed to make the AC areas collectively react to power imbalances. A theoretical study shows that, by using local information only, the control scheme allows to significantly reduce the impact of a power imbalance by distributing the associated frequency deviation over all areas. A secondary frequency control strategy that can be combined with this control scheme is also proposed so as to restore the frequencies and the power exchanges to their nominal values in the aftermath of a power imbalance. Simulation results on a benchmark system with five AC areas illustrate the good performance of the control scheme. [less ▲]

Detailed reference viewed: 113 (2 ULg)
Full Text
See detailMultistage stochastic programming: A scenario tree based approach to planning under uncertainty
Defourny, Boris ULg; Ernst, Damien ULg; Wehenkel, Louis ULg

in Sucar, L. Enrique; Morales, Eduardo F.; Hoey, Jesse (Eds.) Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions (2011)

In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty. We discuss its differences with Markov Decision Processes, from the point of ... [more ▼]

In this chapter, we present the multistage stochastic programming framework for sequential decision making under uncertainty. We discuss its differences with Markov Decision Processes, from the point of view of decision models and solution algorithms. We describe the standard technique for solving approximately multistage stochastic problems, which is based on a discretization of the disturbance space called scenario tree. We insist on a critical issue of the approach: the decisions can be very sensitive to the parameters of the scenario tree, whereas no efficient tool for checking the quality of approximate solutions exists. In this chapter, we show how supervised learning techniques can be used to evaluate reliably the quality of an approximation, and thus facilitate the selection of a good scenario tree. The framework and solution techniques presented in the chapter are explained and detailed on several examples. Along the way, we define notions from decision theory that can be used to quantify, for a particular problem, the advantage of switching to a more sophisticated decision model. [less ▲]

Detailed reference viewed: 221 (31 ULg)
Full Text
Peer Reviewed
See detailConsequence driven decomposition of large-scale power system security analysis
Fonteneau, Florence ULg; Ernst, Damien ULg; Druet, Christophe et al

in Proceedings of the 2010 IREP Symposium - Bulk Power Systems Dynamics and Control - VIII (2010, August)

This paper presents an approach for assessing, in operation planning studies, the security of a large-scale power system by decomposing it into elementary subproblems, each one corresponding to a ... [more ▼]

This paper presents an approach for assessing, in operation planning studies, the security of a large-scale power system by decomposing it into elementary subproblems, each one corresponding to a structural weak-point of the system. We suppose that the structural weak-points are known a priori by the system operators, and are each one described by a set of constraints that are localized in some relatively small area of the system. The security analysis with respect to a given weak-point thus reduces to the identification of the combinations of power system configurations and disturbances that could lead to the violation of some of its constraints. We propose an iterative rare-event simulation approach for identifying such combinations among the very large set of possible ones. The procedure is illustrated on a simplified version of this problem applied to the Belgian transmission system. [less ▲]

Detailed reference viewed: 53 (11 ULg)
Full Text
Peer Reviewed
See detailImpact of delays on a consensus-based primary frequency control scheme for AC systems connected by a multi-terminal HVDC grid
Dai, Jing; Phulpin, Yannick; Sarlette, Alain ULg et al

in Proceedings of the 2010 IREP Symposium - Bulk Power Systems Dynamics and Control - VIII (2010, August)

This paper addresses the problem of sharing primary frequency control reserves among nonsynchronous AC systems connected by a multi-terminal HVDC grid. We focus on a control scheme that modifies the power ... [more ▼]

This paper addresses the problem of sharing primary frequency control reserves among nonsynchronous AC systems connected by a multi-terminal HVDC grid. We focus on a control scheme that modifies the power injections from the different areas into the DC grid based on remote measurements of the other areas’ frequencies. This scheme is proposed and applied to a simplified system in a previous work by the authors. The current paper investigates the effects of delays on the control scheme’s effectiveness. The study shows that there generally exists a maximum acceptable delay, beyond which the areas’ frequency deviations fail to converge to an equilibrium point. This constraint should be taken into account when commissioning such a control scheme. [less ▲]

Detailed reference viewed: 48 (9 ULg)
Full Text
Peer Reviewed
See detailCoordination of voltage control in a power system operated by multiple transmission utilities
Phulpin, Yannick; Begovic, Miroslav; Ernst, Damien ULg

in Proceedings of the 2010 IREP Symposium - Bulk Power Systems Dynamics and Control - VIII (2010, August)

This paper addresses the problem of coordinating voltage control in a large-scale power system partitioned into control areas operated by independent utilities. Two types of coordination modes are ... [more ▼]

This paper addresses the problem of coordinating voltage control in a large-scale power system partitioned into control areas operated by independent utilities. Two types of coordination modes are considered to obtain settings for tap changers, generator voltages, and reactive power injections from compensation devices. First, it is supposed that a supervisor entity, with full knowledge and control of the system, makes decisions with respect to long-term settings of individual utilities. Second, the system is operated according to a decentralized coordination scheme that involves no information exchange between utilities. Those methods are compared with current practices on a 4141 bus system with 7 transmission system operators, where the generation dispatch and load demand models vary in discrete steps. Such a discrete-time model is sufficient to model any event of relevance with respect to long-term system dynamics. Simulations show that centrally coordinated voltage control yields a significant improvement in terms of both operation costs and reserves for emergency control actions. This paper also emphasizes that, although it involves few changes with respect to current practices, the decentralized coordination scheme improves the operation of multi-utility power systems. [less ▲]

Detailed reference viewed: 34 (6 ULg)
Full Text
Peer Reviewed
See detailUpper confidence bound based decision making strategies and dynamic spectrum access
jouini, wassim; Ernst, Damien ULg; Moy, Christophe et al

in Proceedings of the 2010 IEEE International Conference on Communications (2010, May)

In this paper, we consider the problem of exploiting spectrum resources for a secondary user (SU) of a wireless communication network. We suggest that Upper Confidence Bound (UCB) algorithms could be ... [more ▼]

In this paper, we consider the problem of exploiting spectrum resources for a secondary user (SU) of a wireless communication network. We suggest that Upper Confidence Bound (UCB) algorithms could be useful to design decision making strategies for SUs to exploit intelligently the spectrum resources based on their past observations. The algorithms use an index that provides an optimistic estimation of the availability of the resources to the SU. The suggestion is supported by some experimental results carried out on a specific dynamic spectrum access (DSA) framework. [less ▲]

Detailed reference viewed: 31 (2 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo–like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of Conférence Francophone sur l'Apprentissage Automatique (CAp) 2010 (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 23 (10 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010) (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 66 (16 ULg)
Full Text
Peer Reviewed
See detailGenerating informative trajectories by using bounds on the return of control policies
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010) (2010, May)

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for ... [more ▼]

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories. [less ▲]

Detailed reference viewed: 33 (12 ULg)
Full Text
Peer Reviewed
See detailUsing prior knowledge to accelerate online least-squares policy iteration
Busoniu, Lucian; De Schutter, Bart; Babuska, Robert et al

in Proceedings of the 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (2010, May)

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Although RL is generally envisioned as working without any prior knowledge about the system, such knowledge is often available and can be exploited to great advantage. In this paper, we consider prior knowledge about the monotonicity of the control policy with respect to the system states, and we introduce an approach that exploits this type of prior knowledge to accelerate a state-of-the-art RL algorithm called online least-squares policy iteration (LSPI). Monotonic policies are appropriate for important classes of systems appearing in control applications. LSPI is a data-efficient RL algorithm that we previously extended to online learning, but that did not provide until now a way to use prior knowledge about the policy. In an empirical evaluation, online LSPI with prior knowledge learns much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 22 (3 ULg)
Full Text
Peer Reviewed
See detailApproximate dynamic programming with a fuzzy parameterization
Busoniu, Lucian; Ernst, Damien ULg; De Schutter, Bart et al

in Automatica (2010), 46(5), 804-814

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values ... [more ▼]

Dynamic programming (DP) is a powerful paradigm for general, nonlinear optimal control. Computing exact DP solutions is in general only possible when the process states and the control actions take values in a small discrete set. In practice, it is necessary to approximate the solutions. Therefore, we propose an algorithm for approximate DP that relies on a fuzzy partition of the state space, and on a discretization of the action space. This fuzzy Q-iteration algorithm works for deterministic processes, under the discounted return criterion. We prove that fuzzy Q-iteration asymptotically converges to a solution that lies within a bound of the optimal solution. A bound on the suboptimality of the solution obtained in a finite number of iterations is also derived. Under continuity assumptions on the dynamics and on the reward function, we show that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases. These properties hold both when the parameters of the approximator are updated in a synchronous fashion, and when they are updated asynchronously. The asynchronous algorithm is proven to converge at least as fast as the synchronous one. The performance of fuzzy Q-iteration is illustrated in a two-link manipulator control problem. [less ▲]

Detailed reference viewed: 42 (12 ULg)
Full Text
See detailReinforcement Learning and Dynamic Programming using Function Approximators
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

Book published by CRC Press (2010)

Detailed reference viewed: 252 (31 ULg)
Full Text
Peer Reviewed
See detailA cautious approach to generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (2010, January)

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity ... [more ▼]

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. [less ▲]

Detailed reference viewed: 81 (22 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 9 (0 ULg)
Full Text
Peer Reviewed
See detailExploiting policy knowledge in online least-squares policy iteration: An empirical study
Busoniu, Lucian; Ernst, Damien ULg; Babusku, Robert et al

in Automation, Computers, Applied Mathematics (2010), 19(4), 521-529

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems ... [more ▼]

Reinforcement learning (RL) is a promising paradigm for learning optimal control. Traditional RL works for discrete variables only, so to deal with the continuous variables appearing in control problems, approximate representations of the solution are necessary. The field of approximate RL has tremendously expanded over the last decade, and a wide array of effective algorithms is now available. However, RL is generally envisioned as working without any prior knowledge about the system or the solution, whereas such knowledge is often available and can be exploited to great advantage. Therefore, in this paper we describe a method that exploits prior knowledge to accelerate online least-squares policy iteration (LSPI), a state-of-the-art algorithm for approximate RL. We focus on prior knowledge about the monotonicity of the control policy with respect to the system states. Such monotonic policies are appropriate for important classes of systems appearing in control applications, including for instance nearly linear systems and linear systems with monotonic input nonlinearities. In an empirical evaluation, online LSPI with prior knowledge is shown to learn much faster and more reliably than the original online LSPI. [less ▲]

Detailed reference viewed: 43 (3 ULg)