References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailOutbound SPIT Filter with Optimal Performance Guarantees
Jung, Tobias ULg; Martin, Sylvain ULg; Nassar, Mohamed et al

in Computer Networks (2013), 57(7), 16301643

This paper presents a formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work is largely ... [more ▼]

This paper presents a formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work is largely different from related previous work: our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering an abstracted scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error (false positive and false negative), apply Wald’s sequential probability ratio test to the individual sources, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲]

Detailed reference viewed: 86 (38 ULg)
Full Text
Peer Reviewed
See detailOptimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality
Bubeck, Sébastien; Ernst, Damien ULg; Garivier, Aurélien

in Journal of Machine Learning Research (2013), 14

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based ... [more ▼]

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and on the Good-Turing missing mass estimator. We prove two different regret bounds on the performance of this algorithm under weak assumptions on the probabilistic experts. Under more restrictive hypotheses, we also prove a macroscopic optimality result, comparing the algorithm both with an oracle strategy and with uniform sampling. Finally, we provide numerical experiments illustrating these theoretical findings. [less ▲]

Detailed reference viewed: 8 (5 ULg)
Full Text
Peer Reviewed
See detailGénéralisation Min Max pour l'Apprentissage par Renforcement Batch et Déterministe : Relaxations pour le Cas Général T Etapes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a ... [more ▼]

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a déjà été montré qu'il est NP-dur. Deux schémas de relaxation pour le cas deux étapes ont été présentés aux JFPDA'12, et ce papier présente une généralisation de ces schémas au cas T étapes. Le premier schéma fonctionne en éliminant des contraintes afin d'obtenir un problème soluble en temps polynomial. Le deuxième schéma est une relaxation lagrangienne conduisant également à un problème soluble en temps polynomial. On montre théoriquement que ces deux schémas permettent d'obtenir de meilleurs résultats que ceux proposés par [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 35 (6 ULg)
Full Text
Peer Reviewed
See detailScenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning
Defourny, Boris; Ernst, Damien ULg; Wehenkel, Louis ULg

in INFORMS Journal on Computing (2013), 25(3), 488-501

In the context of multistage stochastic optimization problems, we propose a hybrid strategy for generalizing to nonlinear decision rules, using machine learning, a finite data set of constrained vector ... [more ▼]

In the context of multistage stochastic optimization problems, we propose a hybrid strategy for generalizing to nonlinear decision rules, using machine learning, a finite data set of constrained vector-valued recourse decisions optimized using scenario-tree techniques from multistage stochastic programming. The decision rules are based on a statistical model inferred from a given scenario-tree solution and are selected by out-of-sample simulation given the true problem. Because the learned rules depend on the given scenario tree, we repeat the procedure for a large number of randomly generated scenario trees and then select the best solution (policy) found for the true problem. The scheme leads to an ex post selection of the scenario tree itself. Numerical tests evaluate the dependence of the approach on the machine learning aspects and show cases where one can obtain near-optimal solutions, starting with a “weak” scenario-tree generator that randomizes the branching structure of the trees. [less ▲]

Detailed reference viewed: 80 (18 ULg)
Full Text
Peer Reviewed
See detailMin max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in SIAM Journal on Control & Optimization (2013), 51(5), 33553385

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode ... [more ▼]

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above]. [less ▲]

Detailed reference viewed: 44 (13 ULg)
Full Text
Peer Reviewed
See detailActive network management: planning under uncertainty for exploiting load modulation
Gemine, Quentin ULg; Karangelos, Efthymios ULg; Ernst, Damien ULg et al

in Proceedings of the 2013 IREP Symposium - Bulk Power Systems Dynamics and Control - IX (2013)

This paper addresses the problem faced by a distribution system operator (DSO) when planning the operation of a network in the short-term. The problem is formulated in the context of high penetration of ... [more ▼]

This paper addresses the problem faced by a distribution system operator (DSO) when planning the operation of a network in the short-term. The problem is formulated in the context of high penetration of renewable energy sources (RES) and distributed generation (DG), and when flexible demand is available. The problem is expressed as a sequential decision-making problem under uncertainty, where, in the first stage, the DSO has to decide whether or not to reserve the availability of flexible demand, and, in the subsequent stages, can curtail the generation and modulate the available flexible loads. We analyze the relevance of this formulation on a small test system, discuss the assumptions made, compare our approach to related work, and indicate further research directions. [less ▲]

Detailed reference viewed: 129 (43 ULg)
Full Text
Peer Reviewed
See detailStratégies d'échantillonnage pour l'apprentissage par renforcement batch
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2013), 27(2), 171-194

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some ... [more ▼]

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. [less ▲]

Detailed reference viewed: 35 (7 ULg)
Full Text
Peer Reviewed
See detailMeta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Maes, Francis; Wehenkel, Louis ULg; Ernst, Damien ULg

in Filipe, Joaquim; Fred, Ana (Eds.) Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers (2013)

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field ... [more ▼]

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed “Bernoulli” bandit problems and various playing budgets, show that the metalearnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-T UNED, UCB-V, KL-UCB and epsilon-GREEDY); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution. [less ▲]

Detailed reference viewed: 20 (6 ULg)
Full Text
Peer Reviewed
See detailOptimized Look-Ahead Trees: Extensions to Large and Continuous Action Spaces
Jung, Tobias ULg; Ernst, Damien ULg; Maes, Francis

in Proc. of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL'13) (2013)

This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to ... [more ▼]

This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to the generative model). We consider optimized look-ahead tree (OLT) policies, a recently introduced family of hybrid techniques, which combine the advantages of look-ahead trees (high precision) with the advantages of direct policy search (low online cost) and which are specifically designed for limited online budgets. We present two extensions of the basic OLT algorithm that on the one side allow tackling deterministic optimal control problems with large and continuous action spaces and that on the other side can also help to further reduce the online complexity. [less ▲]

Detailed reference viewed: 15 (7 ULg)
Full Text
Peer Reviewed
See detailBiorthogonalization Techniques for Least Squares Temporal Difference Learning
Jung, Tobias ULg; Ernst, Damien ULg

Poster (2012, December 07)

We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least ... [more ▼]

We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least squares temporal difference learning. To support efficient both updating and downdating operations, OLS-LSTD uses a biorthogonal representation for the selected basis vectors. Empirical comparisons with the recently proposed MP and LARS frameworks for LSTD are made. [less ▲]

Detailed reference viewed: 55 (17 ULg)
Full Text
Peer Reviewed
See detailOptimal discovery with probabilistic expert advice
Bubeck, Sébastien; Ernst, Damien ULg; Garivier, Aurélien

in Proceedings of the 51st IEEE Conference on Decision and Control (CDC 2012) (2012, December)

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic ... [more ▼]

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turingmissing mass estimator. We show that this strategy attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions. [less ▲]

Detailed reference viewed: 9 (2 ULg)
Full Text
Peer Reviewed
See detailCooperative frequency control with a multi-terminal high-voltage DC network
Sarlette, Alain; Dai, Jing; Phulpin, Yannick et al

in Automatica (2012), 48(12), 31283134

We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make ... [more ▼]

We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make the areas collectively react to power imbalances, so that individual areas can schedule smaller power reserves. The first scheme modifies the power injected by each area into the DC grid as a function of frequency deviations of neighboring AC areas. The second scheme changes the DC voltage of each converter as a function of its own area's frequency only, relying on the physical network to obtain a collective reaction. For both schemes, we prove convergence of the closed-loop system with heterogeneous AC areas. [less ▲]

Detailed reference viewed: 110 (9 ULg)
Full Text
See detailA computationally efficient algorithm for the provision of a day-ahead modulation service by a load aggregator
Mathieu, Sébastien ULg; Karangelos, Efthymios ULg; Louveaux, Quentin ULg et al

Poster (2012, October 08)

We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service ... [more ▼]

We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service consisting of a load modulation option, which can be called by the TSO once per day. The option specifies the maximum amplitude of a potential modification on the demand of the loads within a certain time interval. We consider the specific case where the loads can be modeled by a generic tank model whose inflow depends on the power consumed by the load and outflow is assumed to be known the day before for every market period. The level of the reservoir at the beginning of the market day is also assumed to be known. We show that, under these assumptions, the problem of maximizing the amplitude of the load modulation service can be formulated as a mixed integer linear programming problem (MILP). In order to solve this problem in a computationally efficient manner we introduce a novel heuristic-method. We test this method on a set of problems and demonstrate that our approach is orders of magnitude faster than CPLEX - a state-of-the-art software for solving MILP problems - without considerably compromising the solution accuracy. [less ▲]

Detailed reference viewed: 60 (15 ULg)
Full Text
Peer Reviewed
See detailPolicy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning
Maes, Francis ULg; Fonteneau, Raphaël ULg; Wehenkel, Louis ULg et al

in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October)

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼]

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions. [less ▲]

Detailed reference viewed: 29 (11 ULg)
Full Text
See detailContextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks
Jung, Tobias ULg; Martin, Sylvain ULg; Ernst, Damien ULg et al

E-print/Working paper (2012)

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate -- as an example -- the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently not satisfyingly addressed by standard techniques, as a sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning SPIT filter that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first simulation results. [less ▲]

Detailed reference viewed: 81 (28 ULg)
Full Text
Peer Reviewed
See detailGénéralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May)

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼]

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲]

Detailed reference viewed: 64 (8 ULg)
Full Text
Peer Reviewed
See detailCoordinated primary frequency control among non-synchronous systems connected by a multi-terminal high-voltage direct current grid
Dai, Jing; Phulpin, Yannick; Sarlette, Alain et al

in IET Generation, Transmission & Distribution (2012), 6(2), 99-108

The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed ... [more ▼]

The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed control scheme that modifies the power injections from the different AC areas into the DC grid so as to make the system collectively react to load imbalances. This collective reaction allows each individual AC area to downscale its primary reserves. The scheme is inspired by algorithms for the consensus problem extensively studied by the control theory community. It modifies the power injections based on frequency deviations of the AC areas so as to make them stay close to each other. A stability analysis of the closed-loop system is reported as well as simulation results on a benchmark power system with five AC areas. These results show that with proper tuning, the control scheme makes the frequency deviations converge rapidly to a common value following a load imbalance in an area. [less ▲]

Detailed reference viewed: 56 (3 ULg)