References of "Ernst, Damien"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailPolicy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning
Maes, Francis ULg; Fonteneau, Raphaël ULg; Wehenkel, Louis ULg et al

in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October)

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼]

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions. [less ▲]

Detailed reference viewed: 29 (11 ULg)
Full Text
See detailContextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks
Jung, Tobias ULg; Martin, Sylvain ULg; Ernst, Damien ULg et al

E-print/Working paper (2012)

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate -- as an example -- the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently not satisfyingly addressed by standard techniques, as a sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning SPIT filter that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first simulation results. [less ▲]

Detailed reference viewed: 78 (28 ULg)
Full Text
Peer Reviewed
See detailGénéralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May)

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼]

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲]

Detailed reference viewed: 63 (8 ULg)
Full Text
Peer Reviewed
See detailCoordinated primary frequency control among non-synchronous systems connected by a multi-terminal high-voltage direct current grid
Dai, Jing; Phulpin, Yannick; Sarlette, Alain et al

in IET Generation, Transmission & Distribution (2012), 6(2), 99-108

The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed ... [more ▼]

The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed control scheme that modifies the power injections from the different AC areas into the DC grid so as to make the system collectively react to load imbalances. This collective reaction allows each individual AC area to downscale its primary reserves. The scheme is inspired by algorithms for the consensus problem extensively studied by the control theory community. It modifies the power injections based on frequency deviations of the AC areas so as to make them stay close to each other. A stability analysis of the closed-loop system is reported as well as simulation results on a benchmark power system with five AC areas. These results show that with proper tuning, the control scheme makes the frequency deviations converge rapidly to a common value following a load imbalance in an area. [less ▲]

Detailed reference viewed: 54 (3 ULg)
Full Text
Peer Reviewed
See detailLearning to play K-armed bandit problems
Maes, Francis ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February)

We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm first samples a set of K ... [more ▼]

We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm first samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a fixed number of Bernoulli arms, a fixed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲]

Detailed reference viewed: 105 (18 ULg)
Full Text
Peer Reviewed
See detailImitative Learning for Real-Time Strategy Games
Gemine, Quentin ULg; Safadi, Firas ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012)

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼]

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲]

Detailed reference viewed: 81 (37 ULg)
Full Text
Peer Reviewed
See detailLearning exploration/exploitation strategies for single trajectory reinforcement learning
Castronovo, Michaël ULg; Maes, Francis ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012)

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼]

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲]

Detailed reference viewed: 116 (14 ULg)
Full Text
Peer Reviewed
See detailComparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of Tron
Perrick, Pierre; Lupien St-Pierre, David ULg; Maes, Francis ULg et al

in IEEE Conference on Computational and Intelligence in Games 2012 (2012)

Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we ... [more ▼]

Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem. [less ▲]

Detailed reference viewed: 28 (5 ULg)
Full Text
Peer Reviewed
See detailSPRT for SPIT: Using the Sequential Probability Ratio Test for Spam in VoIP Prevention
Jung, Tobias ULg; Martin, Sylvain ULg; Ernst, Damien ULg et al

in Proc. of 6th International Conference on Autonomous Infrastructure, Management and Security (2012)

This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work ... [more ▼]

This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work deviates from related earlier work where this problem is only addressed by ad-hoc solutions. Our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering a scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error, apply Wald’s sequential probability ratio test, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲]

Detailed reference viewed: 196 (27 ULg)
Full Text
Peer Reviewed
See detailContextual Multi-armed Bandits for Web Server Defense
Jung, Tobias ULg; Martin, Sylvain ULg; Ernst, Damien ULg et al

in Hussein, Abbas (Ed.) Proceedings of 2012 International Joint Conference on Neural Networks (IJCNN) (2012)

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]

In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate the real-world problem of preventing HTTP-based attacks on web servers as a one-shot sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning meta filter for web servers that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first convincing simulation results. [less ▲]

Detailed reference viewed: 185 (69 ULg)
Full Text
Peer Reviewed
See detailRelaxation schemes for min max generalization in deterministic batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December)

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼]

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 82 (10 ULg)
Full Text
Peer Reviewed
See detailArtificial intelligence design for real-time strategy games
Safadi, Firas ULg; Fonteneau, Raphaël ULg; Ernst, Damien ULg

in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December)

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼]

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to flaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲]

Detailed reference viewed: 154 (9 ULg)
Full Text
Peer Reviewed
See detailAncillary services and operation of multi-terminal HVDC grids
Phulpin, Yannick; Ernst, Damien ULg

in Proceedings of the International Workshop on Transmission Networks for Offshore Wind Power as well as on Transmission Networks for Offshore Wind Power Farms Plants (2011, October)

This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies ... [more ▼]

This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies for three types of HVdc grid operators, namely a coordination center, an independent operator, and the transmission system operator in charge of one of the areas interconnected by the multi-terminal HVdc grid. In these contexts, the paper envisions the challenges of using the HVdc infrastructure to provide frequency, voltage, and rotor angle stability-related ancillary services. It also analyzes the technical and economic impacts of the operation strategies on the ac areas’ dynamics. [less ▲]

Detailed reference viewed: 66 (3 ULg)
Full Text
Peer Reviewed
See detailModel predictive control of HVDC power flow to improve transient stability in power systems
Phulpin, Yannick; Hazra, Jagabondhu; Ernst, Damien ULg

in Proceedings of the Second IEEE International Conference on Smart Grid Communications (IEEE SmartGridComm) (2011, October)

This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive ... [more ▼]

This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive control, which solves at every time step an open-loop optimal-control problem using an A* event-tree search. Different optimisation criteria based on transient stability indices are compared. The paper presents simulations results for two benchmark systems with 9 and 24 buses, respectively, and an embedded HVDC-link. The results show that the control strategy leads to a modulation of the HVDC power flow that improves significantly the system’s ability to maintain synchronism in the aftermath of a large disturbance. [less ▲]

Detailed reference viewed: 45 (4 ULg)
Full Text
Peer Reviewed
See detailEstimation Monte Carlo sans modèle de politiques de décision
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2011), 25

Detailed reference viewed: 20 (4 ULg)
Full Text
Peer Reviewed
See detailApprentissage actif par modification de la politique de décision courante
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011) (2011, June)

Detailed reference viewed: 12 (5 ULg)
Full Text
Peer Reviewed
See detailActive exploration by searching for experiments that falsify the computed control policy
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most ... [more ▼]

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising. [less ▲]

Detailed reference viewed: 26 (8 ULg)
Full Text
Peer Reviewed
See detailApproximate reinforcement learning: an overview
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart et al

in Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)

Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in ... [more ▼]

Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. [less ▲]

Detailed reference viewed: 88 (3 ULg)
Full Text
Peer Reviewed
See detailCross-entropy optimization of control policies with adaptive basis functions
Busoniu, Lucian; Ernst, Damien ULg; Babuska, Robert et al

in IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics (2011), 41(1), 196-209

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be ... [more ▼]

This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for crossentropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [less ▲]

Detailed reference viewed: 24 (2 ULg)