References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailApprentissage par renforcement batch fondé sur la reconstruction de trajectoires artificielles
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Proceedings of the 9èmes Journées Francophones de Planification, Décision et Apprentissage (JFPDA 2014) (2014)

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant ... [more ▼]

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant un critère donné. On considère plus spécifiquement les problèmes pour lesquels l’espace d’état est continu, problèmes pour lesquels les schémas de résolution classiques se fondent sur l’utilisation d’approxima- teurs de fonctions. Cet article propose une alternative fondée sur la reconstruction de “trajectoires arti- ficielles” permettant d’aborder sous un angle nouveau les problèmes classiques de l’apprentissage par renforcement batch. [less ▲]

Detailed reference viewed: 26 (5 ULg)
Full Text
Peer Reviewed
See detailOn periodic reference tracking using batch-mode reinforcement learning with application to gene regulatory network control
Sootla, Aivar; Strelkowa, Natajala; Ernst, Damien ULg et al

in Proceedings of the 52nd Annual Conference on Decision and Control (CDC 2013) (2013, December)

In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge ... [more ▼]

In this paper, we consider the periodic reference tracking problem in the framework of batch-mode reinforcement learning, which studies methods for solving optimal control problems from the sole knowledge of a set of trajectories. In particular, we extend an existing batch-mode reinforcement learning algorithm, known as Fitted Q Iteration, to the periodic reference tracking problem. The presented periodic reference tracking algorithm explicitly exploits a priori knowledge of the future values of the reference trajectory and its periodicity. We discuss the properties of our approach and illustrate it on the problem of reference tracking for a synthetic biology gene regulatory network known as the generalised repressilator. This system can produce decaying but long-lived oscillations, which makes it an interesting application for the tracking problem. [less ▲]

Detailed reference viewed: 12 (0 ULg)
Full Text
Peer Reviewed
See detailAn efficient algorithm for the provision of a day-ahead modulation service by a load aggregator
Mathieu, Sébastien ULg; Ernst, Damien ULg; Louveaux, Quentin ULg

in Proceedings of the 4th European Innovative Smart Grid Technologies (ISGT) (2013, October)

This article studies a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator. This service is contracted one day ahead and consists in ... [more ▼]

This article studies a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator. This service is contracted one day ahead and consists in a load modulation option, which can be called once per day. The option specifies the range of a potential modification on the demand of the loads within a certain time interval. The specific case where the loads can be modeled by a generic tank model is considered. Under this assumption, the problem of maximizing the range of the load modulation service can be formulated as a mixed integer linear programming problem. A novel heuristic-method is proposed to solve this problem in a computationally efficient manner. This method is tested on a set of problems. The results show that this approach can be orders of magnitude faster than CPLEX without significantly degrading the solution accuracy. [less ▲]

Detailed reference viewed: 114 (31 ULg)
Full Text
Peer Reviewed
See detailThe global grid
Chatzivasileiadis, Spyros; Ernst, Damien ULg; Andersson, Göran

in Renewable Energy : An International Journal (2013), 57

This paper puts forward the vision that a natural future stage of the electricity network could be a grid spanning the whole planet and connecting most of the large power plants in the world: this is the ... [more ▼]

This paper puts forward the vision that a natural future stage of the electricity network could be a grid spanning the whole planet and connecting most of the large power plants in the world: this is the “Global Grid”. The main driving force behind the Global Grid will be the harvesting of remote renewable sources, and its key infrastructure element will be the high capacity long transmission lines. Wind farms and solar power plants will supply load centers with green power over long distances. This paper focuses on the introduction of the concept, showing that a globally interconnected network can be technologically feasible and economically competitive. We further highlight the multiple opportunities emerging from a global electricity network such as smoothing the renewable energy supply and electricity demand, reducing the need for bulk storage, and reducing the volatility of the energy prices. We also discuss possible investment mechanisms and operating schemes. Among others, we envision in such a system a global power market and the establishment of two new coordinating bodies, the “Global Regulator” and the “Global System Operator”. [less ▲]

Detailed reference viewed: 185 (31 ULg)
Full Text
Peer Reviewed
See detailThe global grid
Chatzivasileiadis, Spyros; Ernst, Damien ULg; Andersson, Göran

in Renewable Energy : An International Journal (2013), 57

This paper puts forward the vision that a natural future stage of the electricity network could be a grid spanning the whole planet and connecting most of the large power plants in the world: this is the ... [more ▼]

This paper puts forward the vision that a natural future stage of the electricity network could be a grid spanning the whole planet and connecting most of the large power plants in the world: this is the “Global Grid”. The main driving force behind the Global Grid will be the harvesting of remote renewable sources, and its key infrastructure element will be the high capacity long transmission lines. Wind farms and solar power plants will supply load centers with green power over long distances. This paper focuses on the introduction of the concept, showing that a globally interconnected network can be technologically feasible and economically competitive. We further highlight the multiple opportunities emerging from a global electricity network such as smoothing the renewable energy supply and electricity demand, reducing the need for bulk storage, and reducing the volatility of the energy prices. We also discuss possible investment mechanisms and operating schemes. Among others, we envision in such a system a global power market and the establishment of two new coordinating bodies, the “Global Regulator” and the “Global System Operator”. [less ▲]

Detailed reference viewed: 185 (31 ULg)
Full Text
Peer Reviewed
See detailMonte Carlo search algorithm discovery for single-player games
Maes, Francis; Lupien St-Pierre, David ULg; Ernst, Damien ULg

in IEEE Transactions on Computational Intelligence and AI in Games (2013), 5(3), 201-213

Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major ... [more ▼]

Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. We first introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multi-armed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as Upper Confidence bounds applied to Trees and Nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems. [less ▲]

Detailed reference viewed: 164 (12 ULg)
Full Text
Peer Reviewed
See detailBatch mode reinforcement learning based on the synthesis of artificial trajectories
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Annals of Operations Research (2013), 208(1), 383-416

Detailed reference viewed: 76 (21 ULg)
Full Text
See detailRisque majeur de blackout : que faire ?
Ernst, Damien ULg

Article for general public (2013)

Detailed reference viewed: 31 (8 ULg)
Full Text
See detailQuelles perspectives pour les énergies renouvelables en Wallonie ?
Ernst, Damien ULg

in LiègeU (2013), Eté 2013

Detailed reference viewed: 56 (6 ULg)
Full Text
Peer Reviewed
See detailOutbound SPIT Filter with Optimal Performance Guarantees
Jung, Tobias ULg; Martin, Sylvain ULg; Nassar, Mohamed et al

in Computer Networks (2013), 57(7), 16301643

This paper presents a formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work is largely ... [more ▼]

This paper presents a formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work is largely different from related previous work: our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering an abstracted scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error (false positive and false negative), apply Wald’s sequential probability ratio test to the individual sources, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲]

Detailed reference viewed: 85 (38 ULg)
Full Text
Peer Reviewed
See detailOptimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality
Bubeck, Sébastien; Ernst, Damien ULg; Garivier, Aurélien

in Journal of Machine Learning Research (2013), 14

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based ... [more ▼]

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and on the Good-Turing missing mass estimator. We prove two different regret bounds on the performance of this algorithm under weak assumptions on the probabilistic experts. Under more restrictive hypotheses, we also prove a macroscopic optimality result, comparing the algorithm both with an oracle strategy and with uniform sampling. Finally, we provide numerical experiments illustrating these theoretical findings. [less ▲]

Detailed reference viewed: 8 (5 ULg)
Full Text
Peer Reviewed
See detailGénéralisation Min Max pour l'Apprentissage par Renforcement Batch et Déterministe : Relaxations pour le Cas Général T Etapes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a ... [more ▼]

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a déjà été montré qu'il est NP-dur. Deux schémas de relaxation pour le cas deux étapes ont été présentés aux JFPDA'12, et ce papier présente une généralisation de ces schémas au cas T étapes. Le premier schéma fonctionne en éliminant des contraintes afin d'obtenir un problème soluble en temps polynomial. Le deuxième schéma est une relaxation lagrangienne conduisant également à un problème soluble en temps polynomial. On montre théoriquement que ces deux schémas permettent d'obtenir de meilleurs résultats que ceux proposés par [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 35 (6 ULg)
Full Text
Peer Reviewed
See detailScenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning
Defourny, Boris; Ernst, Damien ULg; Wehenkel, Louis ULg

in INFORMS Journal on Computing (2013), 25(3), 488-501

In the context of multistage stochastic optimization problems, we propose a hybrid strategy for generalizing to nonlinear decision rules, using machine learning, a finite data set of constrained vector ... [more ▼]

In the context of multistage stochastic optimization problems, we propose a hybrid strategy for generalizing to nonlinear decision rules, using machine learning, a finite data set of constrained vector-valued recourse decisions optimized using scenario-tree techniques from multistage stochastic programming. The decision rules are based on a statistical model inferred from a given scenario-tree solution and are selected by out-of-sample simulation given the true problem. Because the learned rules depend on the given scenario tree, we repeat the procedure for a large number of randomly generated scenario trees and then select the best solution (policy) found for the true problem. The scheme leads to an ex post selection of the scenario tree itself. Numerical tests evaluate the dependence of the approach on the machine learning aspects and show cases where one can obtain near-optimal solutions, starting with a “weak” scenario-tree generator that randomizes the branching structure of the trees. [less ▲]

Detailed reference viewed: 79 (18 ULg)
Full Text
Peer Reviewed
See detailMin max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in SIAM Journal on Control & Optimization (2013), 51(5), 33553385

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode ... [more ▼]

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above]. [less ▲]

Detailed reference viewed: 44 (13 ULg)
Full Text
Peer Reviewed
See detailActive network management: planning under uncertainty for exploiting load modulation
Gemine, Quentin ULg; Karangelos, Efthymios ULg; Ernst, Damien ULg et al

in Proceedings of the 2013 IREP Symposium - Bulk Power Systems Dynamics and Control - IX (2013)

This paper addresses the problem faced by a distribution system operator (DSO) when planning the operation of a network in the short-term. The problem is formulated in the context of high penetration of ... [more ▼]

This paper addresses the problem faced by a distribution system operator (DSO) when planning the operation of a network in the short-term. The problem is formulated in the context of high penetration of renewable energy sources (RES) and distributed generation (DG), and when flexible demand is available. The problem is expressed as a sequential decision-making problem under uncertainty, where, in the first stage, the DSO has to decide whether or not to reserve the availability of flexible demand, and, in the subsequent stages, can curtail the generation and modulate the available flexible loads. We analyze the relevance of this formulation on a small test system, discuss the assumptions made, compare our approach to related work, and indicate further research directions. [less ▲]

Detailed reference viewed: 125 (42 ULg)
Full Text
Peer Reviewed
See detailStratégies d'échantillonnage pour l'apprentissage par renforcement batch
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2013), 27(2), 171-194

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some ... [more ▼]

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. [less ▲]

Detailed reference viewed: 35 (7 ULg)
Full Text
Peer Reviewed
See detailMeta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Maes, Francis; Wehenkel, Louis ULg; Ernst, Damien ULg

in Filipe, Joaquim; Fred, Ana (Eds.) Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers (2013)

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field ... [more ▼]

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed “Bernoulli” bandit problems and various playing budgets, show that the metalearnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-T UNED, UCB-V, KL-UCB and epsilon-GREEDY); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution. [less ▲]

Detailed reference viewed: 20 (6 ULg)
Full Text
Peer Reviewed
See detailOptimized Look-Ahead Trees: Extensions to Large and Continuous Action Spaces
Jung, Tobias ULg; Ernst, Damien ULg; Maes, Francis

in Proc. of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL'13) (2013)

This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to ... [more ▼]

This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to the generative model). We consider optimized look-ahead tree (OLT) policies, a recently introduced family of hybrid techniques, which combine the advantages of look-ahead trees (high precision) with the advantages of direct policy search (low online cost) and which are specifically designed for limited online budgets. We present two extensions of the basic OLT algorithm that on the one side allow tackling deterministic optimal control problems with large and continuous action spaces and that on the other side can also help to further reduce the online complexity. [less ▲]

Detailed reference viewed: 14 (7 ULg)
Full Text
Peer Reviewed
See detailBiorthogonalization Techniques for Least Squares Temporal Difference Learning
Jung, Tobias ULg; Ernst, Damien ULg

Poster (2012, December 07)

We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least ... [more ▼]

We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least squares temporal difference learning. To support efficient both updating and downdating operations, OLS-LSTD uses a biorthogonal representation for the selected basis vectors. Empirical comparisons with the recently proposed MP and LARS frameworks for LSTD are made. [less ▲]

Detailed reference viewed: 53 (17 ULg)
Full Text
Peer Reviewed
See detailOptimal discovery with probabilistic expert advice
Bubeck, Sébastien; Ernst, Damien ULg; Garivier, Aurélien

in Proceedings of the 51st IEEE Conference on Decision and Control (CDC 2012) (2012, December)

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic ... [more ▼]

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turingmissing mass estimator. We show that this strategy attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions. [less ▲]

Detailed reference viewed: 8 (2 ULg)