Browse ORBi by ORBi project

- Background
- Content
- Benefits and challenges
- Legal aspects
- Functions and services
- Team
- Help and tutorials

Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search Jung, Tobias ; Wehenkel, Louis ; Ernst, Damien et al in International Journal of Adaptive Control and Signal Processing (2014), 28(3-5), 255-289 Direct policy search (DPS) and look-ahead tree (LT) policies are two popular techniques for solving difficult sequential decision-making problems. They both are simple to implement, widely applicable ... [more ▼] Direct policy search (DPS) and look-ahead tree (LT) policies are two popular techniques for solving difficult sequential decision-making problems. They both are simple to implement, widely applicable without making strong assumptions on the structure of the problem, and capable of producing high performance control policies. However, computationally both of them are, each in their own way, very expensive. DPS can require huge offline resources (effort required to obtain the policy) to first select an appropriate space of parameterized policies that works well for the targeted problem, and then to determine the best values of the parameters via global optimization. LT policies do not require any offline resources; however, they typically require huge online resources (effort required to calculate the best decision at each step) in order to grow trees of sufficient depth. In this paper, we propose optimized look-ahead trees (OLT), a model-based policy learning scheme that lies at the intersection of DPS and LT. In OLT, the control policy is represented indirectly through an algorithm that at each decision step develops, as in LT using a model of the dynamics, a small look-ahead tree until a prespecified online budget is exhausted. Unlike LT, the development of the tree is not driven by a generic heuristic; rather, the heuristic is optimized for the target problem and implemented as a parameterized node scoring function learned offline via DPS. We experimentally compare OLT with pure DPS and pure LT variants on optimal control benchmark domains. The results show that the LT-based representation is a versatile way of compactly representing policies in a DPS scheme (which results in OLT being easier to tune and having lower offline complexity than pure DPS); while at the same time, DPS helps to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions (which results in OLT having lower online complexity than pure LT). Moreover, OLT produces overall better performing policies than pure DPS and pure LT and also results in policies that are robust with respect to perturbations of the initial conditions. [less ▲] Detailed reference viewed: 130 (33 ULg)Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning Maes, Francis ; Fonteneau, Raphaël ; Wehenkel, Louis et al in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October) In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼] In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efﬁcient and interpretable solutions. [less ▲] Detailed reference viewed: 35 (12 ULg)Embedding Monte Carlo search of features in tree-based ensemble methods Maes, Francis ; Geurts, Pierre ; Wehenkel, Louis in Flach, Peter; De Bie, Tijl; Cristianini, Nello (Eds.) Machine Learning and Knowledge Discovery in Data Bases (2012, September) Feature generation is the problem of automatically constructing good features for a given target learning problem. While most feature generation algorithms belong either to the filter or to the wrapper ... [more ▼] Feature generation is the problem of automatically constructing good features for a given target learning problem. While most feature generation algorithms belong either to the filter or to the wrapper approach, this paper focuses on embedded feature generation. We propose a general scheme to embed feature generation in a wide range of tree-based learning algorithms, including single decision trees, random forests and tree boosting. It is based on the formalization of feature construction as a sequential decision making problem addressed by a tractable Monte Carlo search algorithm coupled with node splitting. This leads to fast algorithms that are applicable to large-scale problems. We empirically analyze the performances of these tree-based learners combined or not with the feature generation capability on several standard datasets. [less ▲] Detailed reference viewed: 62 (8 ULg)Supervised learning to tune simulated annealing for in silico protein structure prediction Marcos Alvarez, Alejandro ; Maes, Francis ; Wehenkel, Louis in Verleysen, Michel (Ed.) ESANN 2012 proceedings, 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2012, April 25) Simulated annealing is a widely used stochastic optimization algorithm whose efficiency essentially depends on the proposal distribu- tion used to generate the next search state at each step. We propose ... [more ▼] Simulated annealing is a widely used stochastic optimization algorithm whose efficiency essentially depends on the proposal distribu- tion used to generate the next search state at each step. We propose to adapt this distribution to a family of parametric optimization problems by using supervised machine learning on a sample of search states derived from a set of typical runs of the algorithm over this family. We apply this idea in the context of in silico protein structure prediction. [less ▲] Detailed reference viewed: 144 (35 ULg)Learning to play K-armed bandit problems Maes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February) We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K ... [more ▼] We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a ﬁxed number of Bernoulli arms, a ﬁxed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲] Detailed reference viewed: 141 (19 ULg)Learning exploration/exploitation strategies for single trajectory reinforcement learning Castronovo, Michaël ; Maes, Francis ; Fonteneau, Raphaël et al in Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012) We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲] Detailed reference viewed: 250 (31 ULg)Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of Tron ; Lupien St-Pierre, David ; Maes, Francis et al in IEEE Conference on Computational and Intelligence in Games 2012 (2012) Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we ... [more ▼] Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem. [less ▲] Detailed reference viewed: 43 (5 ULg)Automatic discovery of ranking formulas for playing with multi-armed bandits Maes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 9th European Workshop on Reinforcement Learning (EWRL 2011) (2011) We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for ... [more ▼] We propose an approach for discovering in an automatic way formulas for ranking arms while playing with multi-armed bandits. The approach works by de ning a grammar made of basic elements such as for example addition, subtraction, the max operator, the average values of rewards collected by an arm, their standard deviation etc., and by exploiting this grammar to generate and test a large number of formulas. The systematic search for good candidate formulas is carried out by a built-on-purpose optimization algorithm used to navigate inside this large set of candidate formulas towards those that give high performances when using them on some multi-armed bandit problems. We have applied this approach on a set of bandit problems made of Bernoulli, Gaussian and truncated Gaussian distributions and have identi ed a few simple ranking formulas that provide interesting results on every problem of this set. In particular, they clearly outperform several reference policies previously introduced in the literature. We argue that these newly found formulas as well as the procedure for generating them may suggest new directions for studying bandit problems. [less ▲] Detailed reference viewed: 55 (19 ULg)Optimized look-ahead tree policies Maes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 9th European Workshop on Reinforcement Learning (EWRL 2011) (2011) We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an in finite time horizon. Given ... [more ▼] We consider in this paper look-ahead tree techniques for the discrete-time control of a deterministic dynamical system so as to maximize a sum of discounted rewards over an in finite time horizon. Given the current system state xt at time t, these techniques explore the look-ahead tree representing possible evolutions of the system states and rewards conditioned on subsequent actions ut, ut+1, ... . When the computing budget is exhausted, they output the action ut that led to the best found sequence of discounted rewards. In this context, we are interested in computing good strategies for exploring the look-ahead tree. We propose a generic approach that looks for such strategies by solving an optimization problem whose objective is to compute a (budget compliant) tree-exploration strategy yielding a control policy maximizing the average return over a postulated set of initial states. This generic approach is fully speci ed to the case where the space of candidate tree-exploration strategies are "best-first" strategies parameterized by a linear combination of look-ahead path features - some of them having been advocated in the literature before - and where the optimization problem is solved by using an EDA-algorithm based on Gaussian distributions. Numerical experiments carried out on a model of the treatment of the HIV infection show that the optimized tree-exploration strategy is orders of magnitudes better than the previously advocated ones. [less ▲] Detailed reference viewed: 85 (12 ULg)Prédiction structurée multitâche itérative de propriétés structurelles de protéines Maes, Francis ; Becker, Julien ; Wehenkel, Louis in 7e Plateforme AFIA: Association Française pour l'Intelligence Artificielle (2011) Le développement d'outils informatiques pour prédire de l'information structurelle de protéines à partir de la séquence en acides aminés constitue un des défis majeurs de la bioinformatique. Les problèmes ... [more ▼] Le développement d'outils informatiques pour prédire de l'information structurelle de protéines à partir de la séquence en acides aminés constitue un des défis majeurs de la bioinformatique. Les problèmes tels que la prédiction de la structure secondaire, de l'accessibilité au solvant, ou encore la prédiction des régions désordonnées, peuvent être exprimés comme des problèmes de prédiction avec sorties structurées et sont traditionnellement résolus individuellement par des méthodes d'apprentissage automatique existantes. Etant donné que ces problèmes sont fortement liés les uns aux autres, nous proposons de les traiter ensemble par une approche d'apprentissage multitâche. A cette fin, nous introduisons un nouveau cadre d'apprentissage générique pour la prédiction structurée multitâche. Nous appliquons cette stratégie pour résoudre un ensemble de cinq tâches de prédiction de propriétés structurelles des protéines. Nos résultats expérimentaux sur deux jeux de données montrent que la stratégie proposée est significativement meilleure que les approches traitant individuellement les tâches. [less ▲] Detailed reference viewed: 19 (2 ULg)Iterative multi-task sequence labeling for predicting structural properties of proteins Maes, Francis ; Becker, Julien ; Wehenkel, Louis in ESANN 2011 (2011) Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary ... [more ▼] Developing computational tools for predicting protein structural information given their amino acid sequence is of primary importance in protein science. Problems, such as the prediction of secondary structures, of solvent accessibility, or of disordered regions, can be expressed as sequence labeling problems and could be solved independently by existing machine learning based sequence labeling approaches. But, since these problems are closely related, we propose to rather approach them jointly in a multi-task approach. To this end, we introduce a new generic framework for iterative multi-task sequence labeling. We apply this - conceptually simple but quite effective - strategy to jointly solve a set of five protein annotation tasks. Our empirical results with two protein datasets show that the proposed strategy significantly outperforms the single-task approaches. [less ▲] Detailed reference viewed: 54 (2 ULg) |
||