References of "Fonteneau, Raphaël"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailBatch mode reinforcement learning based on the synthesis of artificial trajectories
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Annals of Operations Research (2013), 208(1), 383-416

Detailed reference viewed: 21 (11 ULg)
Full Text
Peer Reviewed
See detailAggregating Optimistic Planning Trees for Solving Markov Decision Processes
Kedenburg, Gunnar; Fonteneau, Raphaël ULg; Munos, Rémi

in Advances in Neural Information Processing Systems 26 (2013) (2013)

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a ... [more ▼]

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions. [less ▲]

Detailed reference viewed: 8 (0 ULg)
Full Text
Peer Reviewed
See detailAn Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning
Fonteneau, Raphaël ULg; Korda, Nathan; Munos, Rémi

in NIPS 2013 Workshop on Bayesian Optimization (BayesOpt2013) (2013)

We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic ... [more ▼]

We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising. [less ▲]

Detailed reference viewed: 37 (0 ULg)
Full Text
Peer Reviewed
See detailPlanification Optimiste dans les Processus Décisionnels de Markov avec Croyance
Fonteneau, Raphaël ULg; Busoniu, Lucian; Munos, Rémi

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l ... [more ▼]

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l'approche de l'algorithme OP-MDP (de l'anglais ``Optimistic Planning for Markov Decision Processes'', voir [Busoniu2011,Busoniu2012]) au cas où les probabilités de transitions du MDP sous-jacent sont initialement inconnues, et doivent être apprises au travers d'interactions avec l'environnement. Les connaissances sur le MDP sous-jacent sont représentées par une distribution de probabilités sur l'ensemble de tous les modèles de transitions à l'aide de distributions de Dirichlet. L'algorithme BOP planifie dans l'espace augmenté état-croyance obtenu par concaténation du vecteur d'état avec la distribution postérieure sur les modèles de transitions. On montre que BOP atteint l'optimalité Bayésienne lorsque le paramètre de budget tend vers l'infini. Quelques expériences préliminaires montrent des résultats encourageants. [less ▲]

Detailed reference viewed: 21 (5 ULg)
Full Text
Peer Reviewed
See detailGénéralisation Min Max pour l'Apprentissage par Renforcement Batch et Déterministe : Relaxations pour le Cas Général T Etapes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a ... [more ▼]

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a déjà été montré qu'il est NP-dur. Deux schémas de relaxation pour le cas deux étapes ont été présentés aux JFPDA'12, et ce papier présente une généralisation de ces schémas au cas T étapes. Le premier schéma fonctionne en éliminant des contraintes afin d'obtenir un problème soluble en temps polynomial. Le deuxième schéma est une relaxation lagrangienne conduisant également à un problème soluble en temps polynomial. On montre théoriquement que ces deux schémas permettent d'obtenir de meilleurs résultats que ceux proposés par [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 31 (6 ULg)
Full Text
Peer Reviewed
See detailOptimistic Planning for Belief-Augmented Markov Decision Processes
Fonteneau, Raphaël ULg; Busoniu, Lucian; Munos, Rémi

in Proceedings 2013 Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-13), Singapore, 15–19 April 2013 (2013)

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov ... [more ▼]

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [Busoniu2011,Busoniu2012] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. [less ▲]

Detailed reference viewed: 10 (2 ULg)
Full Text
Peer Reviewed
See detailMin max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in SIAM Journal on Control & Optimization (2013), 51(5), 33553385

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode ... [more ▼]

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above]. [less ▲]

Detailed reference viewed: 31 (8 ULg)
Full Text
Peer Reviewed
See detailStratégies d'échantillonnage pour l'apprentissage par renforcement batch
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2013), 27(2), 171-194

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some ... [more ▼]

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. [less ▲]

Detailed reference viewed: 31 (7 ULg)
Full Text
Peer Reviewed
See detailPolicy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning
Maes, Francis ULg; Fonteneau, Raphaël ULg; Wehenkel, Louis ULg et al

in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October)

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼]

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions. [less ▲]

Detailed reference viewed: 26 (11 ULg)
Full Text
Peer Reviewed
See detailGénéralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May)

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼]

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲]

Detailed reference viewed: 61 (8 ULg)
Full Text
Peer Reviewed
See detailImitative Learning for Real-Time Strategy Games
Gemine, Quentin ULg; Safadi, Firas ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012)

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼]

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲]

Detailed reference viewed: 68 (34 ULg)
Full Text
Peer Reviewed
See detailLearning exploration/exploitation strategies for single trajectory reinforcement learning
Castronovo, Michaël ULg; Maes, Francis ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012)

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼]

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲]

Detailed reference viewed: 107 (12 ULg)
Full Text
Peer Reviewed
See detailRelaxation schemes for min max generalization in deterministic batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December)

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼]

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 75 (10 ULg)
Full Text
Peer Reviewed
See detailArtificial intelligence design for real-time strategy games
Safadi, Firas ULg; Fonteneau, Raphaël ULg; Ernst, Damien ULg

in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December)

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼]

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to flaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲]

Detailed reference viewed: 138 (9 ULg)
Full Text
Peer Reviewed
See detailEstimation Monte Carlo sans modèle de politiques de décision
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2011), 25

Detailed reference viewed: 19 (4 ULg)
Full Text
Peer Reviewed
See detailApprentissage actif par modification de la politique de décision courante
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011) (2011, June)

Detailed reference viewed: 10 (5 ULg)
Full Text
Peer Reviewed
See detailActive exploration by searching for experiments that falsify the computed control policy
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most ... [more ▼]

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising. [less ▲]

Detailed reference viewed: 26 (8 ULg)
Full Text
See detailContributions to Batch Mode Reinforcement Learning
Fonteneau, Raphaël ULg

Doctoral thesis (2011)

This dissertation presents various research contributions published during these four years of PhD in the field of batch mode reinforcement learning, which studies optimal control problems for which the ... [more ▼]

This dissertation presents various research contributions published during these four years of PhD in the field of batch mode reinforcement learning, which studies optimal control problems for which the only information available on the system dynamics and the reward function is gathered in a set of trajectories. We first focus on deterministic problems in continuous spaces. In such a context, and under some assumptions related to the smoothness of the environment, we propose a new approach for inferring bounds on the performance of control policies. We also derive from these bounds a new inference algorithm for generalizing the information contained in the batch collection of trajectories in a cautious manner. This inference algorithm as itself lead us to propose a min max generalization framework. When working on batch mode reinforcement learning problems, one has also often to consider the problem of generating informative trajectories. This dissertation proposes two different approaches for addressing this problem. The first approach uses the bounds mentioned above to generate data tightening these bounds. The second approach proposes to generate data that are predicted to generate a change in the inferred optimal control policy. While the above mentioned contributions consider a deterministic framework, we also report on two research contributions which consider a stochastic setting. The first one addresses the problem of evaluating the expected return of control policies in the presence of disturbances. The second one proposes a technique for selecting relevant variables in a batch mode reinforcement learning context, in order to compute simplified control policies that are based on smaller sets of state variables. [less ▲]

Detailed reference viewed: 82 (15 ULg)
Full Text
Peer Reviewed
See detailTowards min max generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Filipe, Joaquim; Fred, Ana; Sharp, Bernadette (Eds.) Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers (2011)

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst ... [more ▼]

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the particular case of deterministic Lipschitz continuous environments over continuous state spaces, nite action spaces, and a nite optimization horizon. We discuss the non-triviality of computing an exact solution of the min max problem even after reformulating it so as to avoid search in function spaces. For addressing this problem, we propose to replace, inside this min max problem, the search for the worst environment given a sequence of actions by an expression that lower bounds the worst return that can be obtained for a given sequence of actions. This lower bound has a tightness that depends on the sample sparsity. From there, we propose an algorithm of polynomial complexity that returns a sequence of actions leading to the maximization of this lower bound. We give a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. Our experiments show that this algorithm can lead to more cautious policies than algorithms combining dynamic programming with function approximators. [less ▲]

Detailed reference viewed: 27 (4 ULg)