Browse ORBi by ORBi project

- Background
- Content
- Benefits and challenges
- Legal aspects
- Functions and services
- Team
- Help and tutorials

Cooperative frequency control with a multi-terminal high-voltage DC network ; ; et al in Automatica (2012), 48(12), 31283134 We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make ... [more ▼] We consider frequency control in power systems made of several non-synchronous AC areas connected by a multi-terminal high-voltage direct current (HVDC) grid. We propose two HVDC control schemes to make the areas collectively react to power imbalances, so that individual areas can schedule smaller power reserves. The first scheme modifies the power injected by each area into the DC grid as a function of frequency deviations of neighboring AC areas. The second scheme changes the DC voltage of each converter as a function of its own area's frequency only, relying on the physical network to obtain a collective reaction. For both schemes, we prove convergence of the closed-loop system with heterogeneous AC areas. [less ▲] Detailed reference viewed: 131 (9 ULg)A computationally efficient algorithm for the provision of a day-ahead modulation service by a load aggregator Mathieu, Sébastien ; Karangelos, Efthymios ; Louveaux, Quentin et al Poster (2012, October 08) We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service ... [more ▼] We study a decision making problem faced by an aggregator willing to offer a load modulation service to a Transmission System Operator (TSO). In particular, we concentrate on a day-ahead service consisting of a load modulation option, which can be called by the TSO once per day. The option specifies the maximum amplitude of a potential modification on the demand of the loads within a certain time interval. We consider the specific case where the loads can be modeled by a generic tank model whose inflow depends on the power consumed by the load and outflow is assumed to be known the day before for every market period. The level of the reservoir at the beginning of the market day is also assumed to be known. We show that, under these assumptions, the problem of maximizing the amplitude of the load modulation service can be formulated as a mixed integer linear programming problem (MILP). In order to solve this problem in a computationally efficient manner we introduce a novel heuristic-method. We test this method on a set of problems and demonstrate that our approach is orders of magnitude faster than CPLEX - a state-of-the-art software for solving MILP problems - without considerably compromising the solution accuracy. [less ▲] Detailed reference viewed: 76 (19 ULg)Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning Maes, Francis ; Fonteneau, Raphaël ; Wehenkel, Louis et al in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October) In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼] In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efﬁcient and interpretable solutions. [less ▲] Detailed reference viewed: 35 (12 ULg)Contextual Multi-armed Bandits for the Prevention of Spam in VoIP Networks Jung, Tobias ; Martin, Sylvain ; Ernst, Damien et al E-print/Working paper (2012) In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼] In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual one and an algorithmical one. The conceptual contribution is to formulate -- as an example -- the real-world problem of preventing SPIT (Spam in VoIP networks), which is currently not satisfyingly addressed by standard techniques, as a sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning SPIT filter that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first simulation results. [less ▲] Detailed reference viewed: 112 (30 ULg)Généralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation Fonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et al in Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May) On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼] On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲] Detailed reference viewed: 77 (10 ULg)Coordinated primary frequency control among non-synchronous systems connected by a multi-terminal high-voltage direct current grid ; ; et al in IET Generation, Transmission & Distribution (2012), 6(2), 99-108 The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed ... [more ▼] The authors consider a power system composed of several non-synchronous AC areas connected by a multiterminal high-voltage direct current (HVDC) grid. In this context, the authors propose a distributed control scheme that modiﬁes the power injections from the different AC areas into the DC grid so as to make the system collectively react to load imbalances. This collective reaction allows each individual AC area to downscale its primary reserves. The scheme is inspired by algorithms for the consensus problem extensively studied by the control theory community. It modiﬁes the power injections based on frequency deviations of the AC areas so as to make them stay close to each other. A stability analysis of the closed-loop system is reported as well as simulation results on a benchmark power system with ﬁve AC areas. These results show that with proper tuning, the control scheme makes the frequency deviations converge rapidly to a common value following a load imbalance in an area. [less ▲] Detailed reference viewed: 67 (4 ULg)Learning to play K-armed bandit problems Maes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February) We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K ... [more ▼] We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a ﬁxed number of Bernoulli arms, a ﬁxed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲] Detailed reference viewed: 141 (19 ULg)Multi-terminal HVDC systems and ancillary services Ernst, Damien Speech/Talk (2012) Detailed reference viewed: 14 (3 ULg)Imitative Learning for Real-Time Strategy Games Gemine, Quentin ; Safadi, Firas ; Fonteneau, Raphaël et al in Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012) Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼] Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲] Detailed reference viewed: 142 (49 ULg)Learning exploration/exploitation strategies for single trajectory reinforcement learning Castronovo, Michaël ; Maes, Francis ; Fonteneau, Raphaël et al in Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012) We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲] Detailed reference viewed: 254 (31 ULg)Min max generalization for two-stage deterministic batch mode reinforcement learning: relaxation schemes Fonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et al Report (2012) Detailed reference viewed: 37 (4 ULg)Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of Tron ; Lupien St-Pierre, David ; Maes, Francis et al in IEEE Conference on Computational and Intelligence in Games 2012 (2012) Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we ... [more ▼] Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem. [less ▲] Detailed reference viewed: 43 (5 ULg)SPRT for SPIT: Using the Sequential Probability Ratio Test for Spam in VoIP Prevention Jung, Tobias ; Martin, Sylvain ; Ernst, Damien et al in Proc. of 6th International Conference on Autonomous Infrastructure, Management and Security (2012) This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work ... [more ▼] This paper presents the first formal framework for identifying and filtering SPIT calls (SPam in Internet Telephony) in an outbound scenario with provable optimal performance. In so doing, our work deviates from related earlier work where this problem is only addressed by ad-hoc solutions. Our goal is to rigorously formalize the problem in terms of mathematical decision theory, find the optimal solution to the problem, and derive concrete bounds for its expected loss (number of mistakes the SPIT filter will make in the worst case). This goal is achieved by considering a scenario amenable to theoretical analysis, namely SPIT detection in an outbound scenario with pure sources. Our methodology is to first define the cost of making an error, apply Wald’s sequential probability ratio test, and then determine analytically error probabilities such that the resulting expected loss is minimized. The benefits of our approach are: (1) the method is optimal (in a sense defined in the paper); (2) the method does not rely on manual tuning and tweaking of parameters but is completely self-contained and mathematically justified; (3) the method is computationally simple and scalable. These are desirable features that would make our method a component of choice in larger, autonomic frameworks. [less ▲] Detailed reference viewed: 231 (27 ULg)Contextual Multi-armed Bandits for Web Server Defense Jung, Tobias ; Martin, Sylvain ; Ernst, Damien et al in Hussein, Abbas (Ed.) Proceedings of 2012 International Joint Conference on Neural Networks (IJCNN) (2012) In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two ... [more ▼] In this paper we argue that contextual multi-armed bandit algorithms could open avenues for designing self-learning security modules for computer networks and related tasks. The paper has two contributions: a conceptual and an algorithmical one. The conceptual contribution is to formulate the real-world problem of preventing HTTP-based attacks on web servers as a one-shot sequential learning problem, namely as a contextual multi-armed bandit. Our second contribution is to present CMABFAS, a new algorithm for general contextual multi-armed bandit learning that specifically targets domains with finite actions. We illustrate how CMABFAS could be used to design a fully self-learning meta filter for web servers that does not rely on feedback from the end-user (i.e., does not require labeled data) and report first convincing simulation results. [less ▲] Detailed reference viewed: 207 (70 ULg)Relaxation schemes for min max generalization in deterministic batch mode reinforcement learning Fonteneau, Raphaël ; Ernst, Damien ; Boigelot, Bernard et al in 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December) We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼] We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲] Detailed reference viewed: 148 (10 ULg)Artificial intelligence design for real-time strategy games Safadi, Firas ; Fonteneau, Raphaël ; Ernst, Damien in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December) For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼] For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to ﬂaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲] Detailed reference viewed: 286 (20 ULg)Learning for exploration-exploitation in RL. The dusk of the small formulas’ reign Ernst, Damien Speech/Talk (2011) Detailed reference viewed: 13 (1 ULg)Ancillary services and operation of multi-terminal HVDC grids ; Ernst, Damien in Proceedings of the International Workshop on Transmission Networks for Offshore Wind Power as well as on Transmission Networks for Offshore Wind Power Farms Plants (2011, October) This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies ... [more ▼] This paper addresses the problem of ancillary services in ac systems interconnected by a multi-terminal HVdc system. It presents opportunities for new control schemes and discusses operation strategies for three types of HVdc grid operators, namely a coordination center, an independent operator, and the transmission system operator in charge of one of the areas interconnected by the multi-terminal HVdc grid. In these contexts, the paper envisions the challenges of using the HVdc infrastructure to provide frequency, voltage, and rotor angle stability-related ancillary services. It also analyzes the technical and economic impacts of the operation strategies on the ac areas’ dynamics. [less ▲] Detailed reference viewed: 100 (4 ULg)Model predictive control of HVDC power ﬂow to improve transient stability in power systems ; ; Ernst, Damien in Proceedings of the Second IEEE International Conference on Smart Grid Communications (IEEE SmartGridComm) (2011, October) This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive ... [more ▼] This paper addresses the problem of HVDC control using real-time information to avoid loss of synchronism phenomena in power systems. It proposes a discrete-time control strategy based on model predictive control, which solves at every time step an open-loop optimal-control problem using an A* event-tree search. Different optimisation criteria based on transient stability indices are compared. The paper presents simulations results for two benchmark systems with 9 and 24 buses, respectively, and an embedded HVDC-link. The results show that the control strategy leads to a modulation of the HVDC power ﬂow that improves signiﬁcantly the system’s ability to maintain synchronism in the aftermath of a large disturbance. [less ▲] Detailed reference viewed: 56 (4 ULg)Estimation Monte Carlo sans modèle de politiques de décision Fonteneau, Raphaël ; ; Wehenkel, Louis et al in Revue d'Intelligence Artificielle [=RIA] (2011), 25 Detailed reference viewed: 25 (4 ULg) |
||