References of "Ernst, Damien"      in Complete repository Arts & humanities   Archaeology   Art & art history   Classical & oriental studies   History   Languages & linguistics   Literature   Performing arts   Philosophy & ethics   Religion & theology   Multidisciplinary, general & others Business & economic sciences   Accounting & auditing   Production, distribution & supply chain management   Finance   General management & organizational theory   Human resources management   Management information systems   Marketing   Strategy & innovation   Quantitative methods in economics & management   General economics & history of economic thought   International economics   Macroeconomics & monetary economics   Microeconomics   Economic systems & public economics   Social economics   Special economic topics (health, labor, transportation…)   Multidisciplinary, general & others Engineering, computing & technology   Aerospace & aeronautics engineering   Architecture   Chemical engineering   Civil engineering   Computer science   Electrical & electronics engineering   Energy   Geological, petroleum & mining engineering   Materials science & engineering   Mechanical engineering   Multidisciplinary, general & others Human health sciences   Alternative medicine   Anesthesia & intensive care   Cardiovascular & respiratory systems   Dentistry & oral medicine   Dermatology   Endocrinology, metabolism & nutrition   Forensic medicine   Gastroenterology & hepatology   General & internal medicine   Geriatrics   Hematology   Immunology & infectious disease   Laboratory medicine & medical technology   Neurology   Oncology   Ophthalmology   Orthopedics, rehabilitation & sports medicine   Otolaryngology   Pediatrics   Pharmacy, pharmacology & toxicology   Psychiatry   Public health, health care sciences & services   Radiology, nuclear medicine & imaging   Reproductive medicine (gynecology, andrology, obstetrics)   Rheumatology   Surgery   Urology & nephrology   Multidisciplinary, general & others Law, criminology & political science   Civil law   Criminal law & procedure   Criminology   Economic & commercial law   European & international law   Judicial law   Metalaw, Roman law, history of law & comparative law   Political science, public administration & international relations   Public law   Social law   Tax law   Multidisciplinary, general & others Life sciences   Agriculture & agronomy   Anatomy (cytology, histology, embryology...) & physiology   Animal production & animal husbandry   Aquatic sciences & oceanology   Biochemistry, biophysics & molecular biology   Biotechnology   Entomology & pest control   Environmental sciences & ecology   Food science   Genetics & genetic processes   Microbiology   Phytobiology (plant sciences, forestry, mycology...)   Veterinary medicine & animal health   Zoology   Multidisciplinary, general & others Physical, chemical, mathematical & earth Sciences   Chemistry   Earth sciences & physical geography   Mathematics   Physics   Space science, astronomy & astrophysics   Multidisciplinary, general & others Social & behavioral sciences, psychology   Animal psychology, ethology & psychobiology   Anthropology   Communication & mass media   Education & instruction   Human geography & demography   Library & information sciences   Neurosciences & behavior   Regional & inter-regional studies   Social work & social policy   Sociology & social sciences   Social, industrial & organizational psychology   Theoretical & cognitive psychology   Treatment & clinical psychology   Multidisciplinary, general & others     Showing results 61 to 80 of 203     1 2 3 4 5 6 7 8 9     Learning to play K-armed bandit problemsMaes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February)We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K ... [more ▼]We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a ﬁxed number of Bernoulli arms, a ﬁxed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲]Detailed reference viewed: 108 (19 ULg) Multi-terminal HVDC systems and ancillary servicesErnst, Damien Speech (2012)Detailed reference viewed: 8 (1 ULg) Imitative Learning for Real-Time Strategy GamesGemine, Quentin ; Safadi, Firas ; Fonteneau, Raphaël et alin Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012)Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼]Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲]Detailed reference viewed: 84 (37 ULg) Learning exploration/exploitation strategies for single trajectory reinforcement learningCastronovo, Michaël ; Maes, Francis ; Fonteneau, Raphaël et alin Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012)We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼]We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲]Detailed reference viewed: 141 (16 ULg) Min max generalization for two-stage deterministic batch mode reinforcement learning: relaxation schemesFonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et alReport (2012)Detailed reference viewed: 30 (3 ULg) Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of TronPerrick, Pierre; Lupien St-Pierre, David ; Maes, Francis et alin IEEE Conference on Computational and Intelligence in Games 2012 (2012)Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we ... [more ▼]Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem. [less ▲]Detailed reference viewed: 35 (5 ULg) SPRT for SPIT: Using the Sequential Probability Ratio Test for Spam in VoIP PreventionJung, Tobias ; Martin, Sylvain ; Ernst, Damien et alin Proc. of 6th International Conference on Autonomous Infrastructure, Management and Security (2012)This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work ... [more ▼]This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work deviates from related earlier work where this problem is only addressed by ad-hoc solutions. Our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering a scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error, apply Wald’s sequential probability ratio test, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲]Detailed reference viewed: 198 (27 ULg) Contextual Multi-armed Bandits for Web Server DefenseJung, Tobias ; Martin, Sylvain ; Ernst, Damien et alin Hussein, Abbas (Ed.) Proceedings of 2012 International Joint Conference on Neural Networks (IJCNN) (2012)In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate the real-world problem of preventing HTTP-based attacks on web servers as a one-shot sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning meta filter for web servers that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first convincing simulation results. [less ▲]Detailed reference viewed: 187 (69 ULg) Relaxation schemes for min max generalization in deterministic batch mode reinforcement learningFonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et alin 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December)We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼]We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲]Detailed reference viewed: 95 (10 ULg) Artificial intelligence design for real-time strategy gamesSafadi, Firas ; Fonteneau, Raphaël ; Ernst, Damien in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December)For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼]For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to ﬂaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲]Detailed reference viewed: 180 (9 ULg) Learning for exploration-exploitation in RL. The dusk of the small formulas’ reignErnst, Damien Speech (2011)Detailed reference viewed: 8 (0 ULg) Ancillary services and operation of multi-terminal HVDC gridsPhulpin, Yannick; Ernst, Damien in Proceedings of the International Workshop on Transmission Networks for Offshore Wind Power as well as on Transmission Networks for Offshore Wind Power Farms Plants (2011, October)This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies ... [more ▼]This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies for three types of HVdc grid operators, namely a coordination center, an independent operator, and the transmission system operator in charge of one of the areas interconnected by the multi-terminal HVdc grid. In these contexts, the paper envisions the challenges of using the HVdc infrastructure to provide frequency, voltage, and rotor angle stability-related ancillary services. It also analyzes the technical and economic impacts of the operation strategies on the ac areas’ dynamics. [less ▲]Detailed reference viewed: 70 (3 ULg) Model predictive control of HVDC power ﬂow to improve transient stability in power systemsPhulpin, Yannick; Hazra, Jagabondhu; Ernst, Damien in Proceedings of the Second IEEE International Conference on Smart Grid Communications (IEEE SmartGridComm) (2011, October)This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive ... [more ▼]This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive control, which solves at every time step an open-loop optimal-control problem using an A* event-tree search. Different optimisation criteria based on transient stability indices are compared. The paper presents simulations results for two benchmark systems with 9 and 24 buses, respectively, and an embedded HVDC-link. The results show that the control strategy leads to a modulation of the HVDC power ﬂow that improves signiﬁcantly the system’s ability to maintain synchronism in the aftermath of a large disturbance. [less ▲]Detailed reference viewed: 47 (4 ULg) Estimation Monte Carlo sans modèle de politiques de décisionFonteneau, Raphaël ; Murphy, Susan A.; Wehenkel, Louis et alin Revue d'Intelligence Artificielle [=RIA] (2011), 25Detailed reference viewed: 20 (4 ULg) Apprentissage actif par modification de la politique de décision couranteFonteneau, Raphaël ; Murphy, Susan A.; Wehenkel, Louis et alin Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011) (2011, June)Detailed reference viewed: 12 (5 ULg) Active exploration by searching for experiments that falsify the computed control policyFonteneau, Raphaël ; Murphy, Susan; Wehenkel, Louis et alin Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most ... [more ▼]We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identiﬁcation method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising. [less ▲]Detailed reference viewed: 26 (8 ULg) Approximate reinforcement learning: an overviewBusoniu, Lucian; Babuska, Robert; De Schutter, Bart et alin Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in ... [more ▼]Reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artiﬁcial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difﬁcult for newcomers to grasp this intricate ﬁeld. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others ofﬂine and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms. [less ▲]Detailed reference viewed: 90 (3 ULg) Cross-entropy optimization of control policies with adaptive basis functionsBusoniu, Lucian; Ernst, Damien ; Babuska, Robert et alin IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics (2011), 41(1), 196-209This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be ... [more ▼]This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are speciﬁed in advance and determine the complexity of the representation. Considerable ﬂexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for crossentropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT. [less ▲]Detailed reference viewed: 26 (2 ULg) Towards min max generalization in reinforcement learningFonteneau, Raphaël ; Murphy, Susan; Wehenkel, Louis et alin Filipe, Joaquim; Fred, Ana; Sharp, Bernadette (Eds.) Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers (2011)In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst ... [more ▼]In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the particular case of deterministic Lipschitz continuous environments over continuous state spaces, nite action spaces, and a nite optimization horizon. We discuss the non-triviality of computing an exact solution of the min max problem even after reformulating it so as to avoid search in function spaces. For addressing this problem, we propose to replace, inside this min max problem, the search for the worst environment given a sequence of actions by an expression that lower bounds the worst return that can be obtained for a given sequence of actions. This lower bound has a tightness that depends on the sample sparsity. From there, we propose an algorithm of polynomial complexity that returns a sequence of actions leading to the maximization of this lower bound. We give a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. Our experiments show that this algorithm can lead to more cautious policies than algorithms combining dynamic programming with function approximators. [less ▲]Detailed reference viewed: 27 (4 ULg) Automatic discovery of ranking formulas for playing with multi-armed banditsMaes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 9th European Workshop on Reinforcement Learning (EWRL 2011) (2011)We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for ... [more ▼]We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for example addition, subtraction, the max operator, the average values of rewards collected by an arm, their standard deviation etc., and by exploiting this grammar to generate and test a large number of formulas. The systematic search for good candidate formulas is carried out by a built-on-purpose optimization algorithm used to navigate inside this large set of candidate formulas towards those that give high performances when using them on some multi-armed bandit problems. We have applied this approach on a set of bandit problems made of Bernoulli, Gaussian and truncated Gaussian distributions and have identi ed a few simple ranking formulas that provide interesting results on every problem of this set. In particular, they clearly outperform several reference policies previously introduced in the literature. We argue that these newly found formulas as well as the procedure for generating them may suggest new directions for studying bandit problems. [less ▲]Detailed reference viewed: 53 (19 ULg)