References of "Ernst, Damien"      in Complete repository Arts & humanities   Archaeology   Art & art history   Classical & oriental studies   History   Languages & linguistics   Literature   Performing arts   Philosophy & ethics   Religion & theology   Multidisciplinary, general & others Business & economic sciences   Accounting & auditing   Production, distribution & supply chain management   Finance   General management & organizational theory   Human resources management   Management information systems   Marketing   Strategy & innovation   Quantitative methods in economics & management   General economics & history of economic thought   International economics   Macroeconomics & monetary economics   Microeconomics   Economic systems & public economics   Social economics   Special economic topics (health, labor, transportation…)   Multidisciplinary, general & others Engineering, computing & technology   Aerospace & aeronautics engineering   Architecture   Chemical engineering   Civil engineering   Computer science   Electrical & electronics engineering   Energy   Geological, petroleum & mining engineering   Materials science & engineering   Mechanical engineering   Multidisciplinary, general & others Human health sciences   Alternative medicine   Anesthesia & intensive care   Cardiovascular & respiratory systems   Dentistry & oral medicine   Dermatology   Endocrinology, metabolism & nutrition   Forensic medicine   Gastroenterology & hepatology   General & internal medicine   Geriatrics   Hematology   Immunology & infectious disease   Laboratory medicine & medical technology   Neurology   Oncology   Ophthalmology   Orthopedics, rehabilitation & sports medicine   Otolaryngology   Pediatrics   Pharmacy, pharmacology & toxicology   Psychiatry   Public health, health care sciences & services   Radiology, nuclear medicine & imaging   Reproductive medicine (gynecology, andrology, obstetrics)   Rheumatology   Surgery   Urology & nephrology   Multidisciplinary, general & others Law, criminology & political science   Civil law   Criminal law & procedure   Criminology   Economic & commercial law   European & international law   Judicial law   Metalaw, Roman law, history of law & comparative law   Political science, public administration & international relations   Public law   Social law   Tax law   Multidisciplinary, general & others Life sciences   Agriculture & agronomy   Anatomy (cytology, histology, embryology...) & physiology   Animal production & animal husbandry   Aquatic sciences & oceanology   Biochemistry, biophysics & molecular biology   Biotechnology   Entomology & pest control   Environmental sciences & ecology   Food science   Genetics & genetic processes   Microbiology   Phytobiology (plant sciences, forestry, mycology...)   Veterinary medicine & animal health   Zoology   Multidisciplinary, general & others Physical, chemical, mathematical & earth Sciences   Chemistry   Earth sciences & physical geography   Mathematics   Physics   Space science, astronomy & astrophysics   Multidisciplinary, general & others Social & behavioral sciences, psychology   Animal psychology, ethology & psychobiology   Anthropology   Communication & mass media   Education & instruction   Human geography & demography   Library & information sciences   Neurosciences & behavior   Regional & inter-regional studies   Social work & social policy   Sociology & social sciences   Social, industrial & organizational psychology   Theoretical & cognitive psychology   Treatment & clinical psychology   Multidisciplinary, general & others     Showing results 81 to 100 of 232     1 2 3 4 5 6 7 8 9 10     Optimized Look-Ahead Trees: Extensions to Large and Continuous Action SpacesJung, Tobias ; Ernst, Damien ; Maes, Francisin Proc. of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL'13) (2013)This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to ... [more ▼]This paper studies look-ahead tree based control policies from the viewpoint of online decision making with constraints on the computational budget allowed per decision (expressed as number of calls to the generative model). We consider optimized look-ahead tree (OLT) policies, a recently introduced family of hybrid techniques, which combine the advantages of look-ahead trees (high precision) with the advantages of direct policy search (low online cost) and which are specifically designed for limited online budgets. We present two extensions of the basic OLT algorithm that on the one side allow tackling deterministic optimal control problems with large and continuous action spaces and that on the other side can also help to further reduce the online complexity. [less ▲]Detailed reference viewed: 19 (8 ULg) Biorthogonalization Techniques for Least Squares Temporal Difference LearningJung, Tobias ; Ernst, Damien Poster (2012, December 07)We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least ... [more ▼]We consider Markov reward processes and study OLS-LSTD, a framework for selecting basis functions from a set of candidates to obtain a sparse representation of the value function in the context of least squares temporal difference learning. To support efficient both updating and downdating operations, OLS-LSTD uses a biorthogonal representation for the selected basis vectors. Empirical comparisons with the recently proposed MP and LARS frameworks for LSTD are made. [less ▲]Detailed reference viewed: 59 (17 ULg) Batch mode reinforcement learning based on the synthesis of artificial trajectoriesErnst, Damien Speech/Talk (2012)Detailed reference viewed: 18 (1 ULg) Optimal discovery with probabilistic expert adviceBubeck, Sébastien; Ernst, Damien ; Garivier, Aurélienin Proceedings of the 51st IEEE Conference on Decision and Control (CDC 2012) (2012, December)Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic ... [more ▼]Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turingmissing mass estimator. We show that this strategy attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions. [less ▲]Detailed reference viewed: 15 (2 ULg) Cooperative frequency control with a multi-terminal high-voltage DC networkSarlette, Alain; Dai, Jing; Phulpin, Yannick et alin Automatica (2012), 48(12), 31283134We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make ... [more ▼]We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make the areas collectively react to power imbalances, so that individual areas can schedule smaller power reserves. The first scheme modifies the power injected by each area into the DC grid as a function of frequency deviations of neighboring AC areas. The second scheme changes the DC voltage of each converter as a function of its own area's frequency only, relying on the physical network to obtain a collective reaction. For both schemes, we prove convergence of the closed-loop system with heterogeneous AC areas. [less ▲]Detailed reference viewed: 136 (9 ULg) A computationally efficient algorithm for the provision of a day-ahead modulation service by a load aggregatorMathieu, Sébastien ; Karangelos, Efthymios ; Louveaux, Quentin et alPoster (2012, October 08)We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service ... [more ▼]We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service consisting of a load modulation option, which can be called by the TSO once per day. The option specifies the maximum amplitude of a potential modification on the demand of the loads within a certain time interval. We consider the specific case where the loads can be modeled by a generic tank model whose inflow depends on the power consumed by the load and outflow is assumed to be known the day before for every market period. The level of the reservoir at the beginning of the market day is also assumed to be known. We show that, under these assumptions, the problem of maximizing the amplitude of the load modulation service can be formulated as a mixed integer linear programming problem (MILP). In order to solve this problem in a computationally efficient manner we introduce a novel heuristic-method. We test this method on a set of problems and demonstrate that our approach is orders of magnitude faster than CPLEX - a state-of-the-art software for solving MILP problems - without considerably compromising the solution accuracy. [less ▲]Detailed reference viewed: 81 (19 ULg) Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learningMaes, Francis ; Fonteneau, Raphaël ; Wehenkel, Louis et alin Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October)In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼]In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efﬁcient and interpretable solutions. [less ▲]Detailed reference viewed: 35 (12 ULg) Contextual Multi-armed Bandits for the Prevention of Spam in VoIP NetworksJung, Tobias ; Martin, Sylvain ; Ernst, Damien et alE-print/Working paper (2012)In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate -- as an example -- the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently not satisfyingly addressed by standard techniques, as a sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning SPIT filter that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first simulation results. [less ▲]Detailed reference viewed: 112 (30 ULg) Généralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxationFonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et alin Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May)On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼]On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲]Detailed reference viewed: 80 (10 ULg) Coordinated primary frequency control among non-synchronous systems connected by a multi-terminal high-voltage direct current gridDai, Jing; Phulpin, Yannick; Sarlette, Alain et alin IET Generation, Transmission & Distribution (2012), 6(2), 99-108The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed ... [more ▼]The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed control scheme that modiﬁes the power injections from the different AC areas into the DC grid so as to make the system collectively react to load imbalances. This collective reaction allows each individual AC area to downscale its primary reserves. The scheme is inspired by algorithms for the consensus problem extensively studied by the control theory community. It modiﬁes the power injections based on frequency deviations of the AC areas so as to make them stay close to each other. A stability analysis of the closed-loop system is reported as well as simulation results on a benchmark power system with ﬁve AC areas. These results show that with proper tuning, the control scheme makes the frequency deviations converge rapidly to a common value following a load imbalance in an area. [less ▲]Detailed reference viewed: 68 (4 ULg) Learning to play K-armed bandit problemsMaes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February)We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K ... [more ▼]We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a ﬁxed number of Bernoulli arms, a ﬁxed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲]Detailed reference viewed: 147 (19 ULg) Multi-terminal HVDC systems and ancillary servicesErnst, Damien Speech/Talk (2012)Detailed reference viewed: 14 (3 ULg) Imitative Learning for Real-Time Strategy GamesGemine, Quentin ; Safadi, Firas ; Fonteneau, Raphaël et alin Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012)Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼]Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲]Detailed reference viewed: 148 (49 ULg) Learning exploration/exploitation strategies for single trajectory reinforcement learningCastronovo, Michaël ; Maes, Francis ; Fonteneau, Raphaël et alin Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012)We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼]We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲]Detailed reference viewed: 265 (31 ULg) Min max generalization for two-stage deterministic batch mode reinforcement learning: relaxation schemesFonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et alReport (2012)Detailed reference viewed: 38 (4 ULg) Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of TronPerrick, Pierre; Lupien St-Pierre, David ; Maes, Francis et alin IEEE Conference on Computational and Intelligence in Games 2012 (2012)Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we ... [more ▼]Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem. [less ▲]Detailed reference viewed: 47 (5 ULg) SPRT for SPIT: Using the Sequential Probability Ratio Test for Spam in VoIP PreventionJung, Tobias ; Martin, Sylvain ; Ernst, Damien et alin Proc. of 6th International Conference on Autonomous Infrastructure, Management and Security (2012)This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work ... [more ▼]This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work deviates from related earlier work where this problem is only addressed by ad-hoc solutions. Our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering a scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error, apply Wald’s sequential probability ratio test, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲]Detailed reference viewed: 233 (27 ULg) Contextual Multi-armed Bandits for Web Server DefenseJung, Tobias ; Martin, Sylvain ; Ernst, Damien et alin Hussein, Abbas (Ed.) Proceedings of 2012 International Joint Conference on Neural Networks (IJCNN) (2012)In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼]In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate the real-world problem of preventing HTTP-based attacks on web servers as a one-shot sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning meta filter for web servers that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first convincing simulation results. [less ▲]Detailed reference viewed: 207 (70 ULg) Relaxation schemes for min max generalization in deterministic batch mode reinforcement learningFonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et alin 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December)We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼]We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲]Detailed reference viewed: 149 (10 ULg) Artificial intelligence design for real-time strategy gamesSafadi, Firas ; Fonteneau, Raphaël ; Ernst, Damien in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December)For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼]For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to ﬂaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲]Detailed reference viewed: 291 (20 ULg)