References of "Fonteneau, Raphaël"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailMathematical modeling of HIV dynamics after antiretroviral therapy initiation: A clinical research
Moog, Claude; Rivadeneira, Pablo; Stan, Guy-Bart et al

in AIDS Research and Human Retroviruses (2014)

Immunological failure is identified from the estimation of certain parameters of a mathematical model of the HIV infection dynamics. This identification is supported by clinical research results from an ... [more ▼]

Immunological failure is identified from the estimation of certain parameters of a mathematical model of the HIV infection dynamics. This identification is supported by clinical research results from an original clinical trial. Standard clinical data were collected from infected patients starting Highly Active Anti-Retroviral Therapy (HAART), just after one month following therapy initiation and were used to carry out the model identification. The early diagnosis is shown to be consistent with the patients monitoring after six months. [less ▲]

Detailed reference viewed: 18 (1 ULg)
Full Text
Peer Reviewed
See detailBayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison
Castronovo, Michaël ULg; Ernst, Damien ULg; Fonteneau, Raphaël ULg

in Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014) (2014, June)

This paper addresses the problem of decision making in unknown finite Markov decision processes (MDPs). The uncertainty about the MDPs is modeled using a prior distribution over a set of candidate MDPs ... [more ▼]

This paper addresses the problem of decision making in unknown finite Markov decision processes (MDPs). The uncertainty about the MDPs is modeled using a prior distribution over a set of candidate MDPs. The performance criterion is the expected sum of discounted rewards collected over an infinite length trajectory. Time constraints are defined as follows: (i) an off-line phase with a given time budget can be used to exploit the prior distribution and (ii) at every time step of the on-line phase, decisions have to be computed within a given time budget. In this setting, we compare two decision-making strategies: OPPS, a recently proposed meta-learning scheme which mainly exploits the off-line phase to perform policy search and BAMCP, a state-of-the-art model-based Bayesian reinforcement learning algorithm, which mainly exploits the on-line time budget. We empirically compare these approaches in a real Bayesian setting by computing their performances over a large set of problems. To the best of our knowledge, it is the first time that this is done in the reinforcement learning literature. Several settings are considered by varying the prior distribution and the distribution from which test problems are drawn. The main finding of these experiments is that there may be a significant benefit of having an off-line prior-based optimization phase in the case of informative and accurate priors, especially when on-line time constraints are tight. [less ▲]

Detailed reference viewed: 102 (46 ULg)
Full Text
Peer Reviewed
See detailApprentissage par renforcement bayésien versus recherche directe de politique hors-ligne en utilisant une distribution a priori: comparaison empirique
Castronovo, Michaël ULg; Ernst, Damien ULg; Fonteneau, Raphaël ULg

in Proceedings des 9èmes Journée Francophones de Planification, Décision et Apprentissage (2014, May)

Cet article aborde le problème de prise de décision séquentielle dans des processus de déci- sion de Markov (MDPs) finis et inconnus. L’absence de connaissance sur le MDP est modélisée sous la forme ... [more ▼]

Cet article aborde le problème de prise de décision séquentielle dans des processus de déci- sion de Markov (MDPs) finis et inconnus. L’absence de connaissance sur le MDP est modélisée sous la forme d’une distribution de probabilité sur un ensemble de MDPs candidats connue a priori. Le cri- tère de performance utilisé est l’espérance de la somme des récompenses actualisées sur une trajectoire infinie. En parallèle du critère d’optimalité, les contraintes liées au temps de calcul sont formalisées rigoureusement. Tout d’abord, une phase « hors-ligne » précédant l’interaction avec le MDP inconnu offre à l’agent la possibilité d’exploiter la distribution a priori pendant un temps limité. Ensuite, durant la phase d’interaction avec le MDP, à chaque pas de temps, l’agent doit prendre une décision dans un laps de temps contraint déterminé. Dans ce contexte, nous comparons deux stratégies de prise de déci- sion : OPPS, une approche récente exploitant essentiellement la phase hors-ligne pour sélectionner une politique dans un ensemble de politiques candidates et BAMCP, une approche récente de planification en-ligne bayésienne. Nous comparons empiriquement ces approches dans un contexte bayésien, en ce sens que nous évaluons leurs performances sur un large ensemble de problèmes tirés selon une distribution de test. A notre connaissance, il s’agit des premiers tests expérimentaux de ce type en apprentissage par renforcement. Nous étudions plusieurs cas de figure en considérant diverses distributions pouvant être utilisées aussi bien en tant que distribution a priori qu’en tant que distribution de test. Les résultats obtenus suggèrent qu’exploiter une distribution a priori durant une phase d’optimisation hors-ligne est un avantage non- négligeable pour des distributions a priori précises et/ou contraintes à de petits budgets temps en-ligne. [less ▲]

Detailed reference viewed: 43 (18 ULg)
Full Text
Peer Reviewed
See detailEstimating the revenues of a hydrogen-based high-capacity storage device: methodology and results
François-Lavet, Vincent ULg; Fonteneau, Raphaël ULg; Ernst, Damien ULg

in Proceedings des 9èmes Journée Francophones de Planification, Décision et Apprentissage (2014, May)

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity ... [more ▼]

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market. The methodology exploits the Dynamic Programming (DP) principle and is specified for hydrogen-based storage devices that use electrolysis to produce hydrogen and fuel cells to generate electricity from hydrogen. Experimental results are generated using historical data of energy prices on the Belgian market. They show how the storage capacity and other parameters of the storage device influence the optimal revenue. The main conclusion drawn from the experiments is that it may be interesting to invest in large storage tanks to exploit the inter-seasonal price fluctuations of electricity. [less ▲]

Detailed reference viewed: 43 (21 ULg)
Full Text
Peer Reviewed
See detailApprentissage par renforcement batch fondé sur la reconstruction de trajectoires artificielles
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Proceedings of the 9èmes Journées Francophones de Planification, Décision et Apprentissage (JFPDA 2014) (2014)

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant ... [more ▼]

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant un critère donné. On considère plus spécifiquement les problèmes pour lesquels l’espace d’état est continu, problèmes pour lesquels les schémas de résolution classiques se fondent sur l’utilisation d’approxima- teurs de fonctions. Cet article propose une alternative fondée sur la reconstruction de “trajectoires arti- ficielles” permettant d’aborder sous un angle nouveau les problèmes classiques de l’apprentissage par renforcement batch. [less ▲]

Detailed reference viewed: 24 (5 ULg)
Full Text
Peer Reviewed
See detailBatch mode reinforcement learning based on the synthesis of artificial trajectories
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Annals of Operations Research (2013), 208(1), 383-416

Detailed reference viewed: 46 (18 ULg)
Full Text
Peer Reviewed
See detailAggregating Optimistic Planning Trees for Solving Markov Decision Processes
Kedenburg, Gunnar; Fonteneau, Raphaël ULg; Munos, Rémi

in Advances in Neural Information Processing Systems 26 (2013) (2013)

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a ... [more ▼]

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions. [less ▲]

Detailed reference viewed: 9 (0 ULg)
Full Text
Peer Reviewed
See detailAn Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning
Fonteneau, Raphaël ULg; Korda, Nathan; Munos, Rémi

in NIPS 2013 Workshop on Bayesian Optimization (BayesOpt2013) (2013)

We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic ... [more ▼]

We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising. [less ▲]

Detailed reference viewed: 61 (0 ULg)
Full Text
Peer Reviewed
See detailPlanification Optimiste dans les Processus Décisionnels de Markov avec Croyance
Fonteneau, Raphaël ULg; Busoniu, Lucian; Munos, Rémi

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l ... [more ▼]

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l'approche de l'algorithme OP-MDP (de l'anglais ``Optimistic Planning for Markov Decision Processes'', voir [Busoniu2011,Busoniu2012]) au cas où les probabilités de transitions du MDP sous-jacent sont initialement inconnues, et doivent être apprises au travers d'interactions avec l'environnement. Les connaissances sur le MDP sous-jacent sont représentées par une distribution de probabilités sur l'ensemble de tous les modèles de transitions à l'aide de distributions de Dirichlet. L'algorithme BOP planifie dans l'espace augmenté état-croyance obtenu par concaténation du vecteur d'état avec la distribution postérieure sur les modèles de transitions. On montre que BOP atteint l'optimalité Bayésienne lorsque le paramètre de budget tend vers l'infini. Quelques expériences préliminaires montrent des résultats encourageants. [less ▲]

Detailed reference viewed: 24 (6 ULg)
Full Text
Peer Reviewed
See detailGénéralisation Min Max pour l'Apprentissage par Renforcement Batch et Déterministe : Relaxations pour le Cas Général T Etapes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13) (2013)

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a ... [more ▼]

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a déjà été montré qu'il est NP-dur. Deux schémas de relaxation pour le cas deux étapes ont été présentés aux JFPDA'12, et ce papier présente une généralisation de ces schémas au cas T étapes. Le premier schéma fonctionne en éliminant des contraintes afin d'obtenir un problème soluble en temps polynomial. Le deuxième schéma est une relaxation lagrangienne conduisant également à un problème soluble en temps polynomial. On montre théoriquement que ces deux schémas permettent d'obtenir de meilleurs résultats que ceux proposés par [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 35 (6 ULg)
Full Text
Peer Reviewed
See detailOptimistic Planning for Belief-Augmented Markov Decision Processes
Fonteneau, Raphaël ULg; Busoniu, Lucian; Munos, Rémi

in Proceedings 2013 Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-13), Singapore, 15–19 April 2013 (2013)

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov ... [more ▼]

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov Decision Processes (OP-MDP) algorithm [Busoniu2011,Busoniu2012] to contexts where the transition model of the MDP is initially unknown and progressively learned through interactions within the environment. The knowledge about the unknown MDP is represented with a probability distribution over all possible transition models using Dirichlet distributions, and the BOP algorithm plans in the belief-augmented state space constructed by concatenating the original state vector with the current posterior distribution over transition models. We show that BOP becomes Bayesian optimal when the budget parameter increases to infinity. Preliminary empirical validations show promising performance. [less ▲]

Detailed reference viewed: 11 (2 ULg)
Full Text
Peer Reviewed
See detailMin max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in SIAM Journal on Control & Optimization (2013), 51(5), 33553385

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode ... [more ▼]

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with fixed, finite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above]. [less ▲]

Detailed reference viewed: 41 (13 ULg)
Full Text
Peer Reviewed
See detailStratégies d'échantillonnage pour l'apprentissage par renforcement batch
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2013), 27(2), 171-194

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some ... [more ▼]

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. [less ▲]

Detailed reference viewed: 34 (7 ULg)
Full Text
Peer Reviewed
See detailPolicy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning
Maes, Francis ULg; Fonteneau, Raphaël ULg; Wehenkel, Louis ULg et al

in Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings (2012, October)

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas ... [more ▼]

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that are used to rank actions. We formalize the search for a high-performance policy as a multi-armed bandit problem where each arm corresponds to a candidate policy canonically represented by its shortest formula-based representation. Experiments, conducted on standard benchmarks, show that this approach manages to determine both efficient and interpretable solutions. [less ▲]

Detailed reference viewed: 29 (11 ULg)
Full Text
Peer Reviewed
See detailGénéralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012) (2012, May)

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans ... [more ▼]

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un premier temps, on montre que le problème est NP-dur. Dans le cas où l’horizon d’optimisation vaut 2, on développe deux schémas de relaxation. Le premier schéma fonctionne en éliminant des contraintes de telle sorte qu’on obtienne un problème soluble en temps polynomial. Le deuxième schéma est une relaxation Lagrangienne conduisant à un problème conique-quadratique. On montre théoriquement et empiriquement que ces deux schémas permettent d’obtenir de meilleurs résultats que ceux proposés par Fonteneau et al. (2011). [less ▲]

Detailed reference viewed: 63 (8 ULg)
Full Text
Peer Reviewed
See detailImitative Learning for Real-Time Strategy Games
Gemine, Quentin ULg; Safadi, Firas ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games (2012)

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ... [more ▼]

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to control agents in these growing environments. Tasks such as world exploration, con- strained pathfinding or team tactics and coordination just to name a few are now default requirements for contemporary video games. However, despite its recent advances, video game AI still lacks the ability to learn. In this paper, we attempt to break the barrier between video game AI and machine learning and propose a generic method allowing real-time strategy (RTS) agents to learn production strategies from a set of recorded games using supervised learning. We test this imitative learning approach on the popular RTS title StarCraft II® and successfully teach a Terran agent facing a Protoss opponent new production strategies. [less ▲]

Detailed reference viewed: 82 (37 ULg)
Full Text
Peer Reviewed
See detailLearning exploration/exploitation strategies for single trajectory reinforcement learning
Castronovo, Michaël ULg; Maes, Francis ULg; Fonteneau, Raphaël ULg et al

in Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012) (2012)

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a ... [more ▼]

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising. [less ▲]

Detailed reference viewed: 120 (14 ULg)
Full Text
Peer Reviewed
See detailRelaxation schemes for min max generalization in deterministic batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg; Boigelot, Bernard ULg et al

in 4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011) (2011, December)

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the ... [more ▼]

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011]. [less ▲]

Detailed reference viewed: 84 (10 ULg)
Full Text
Peer Reviewed
See detailArtificial intelligence design for real-time strategy games
Safadi, Firas ULg; Fonteneau, Raphaël ULg; Ernst, Damien ULg

in NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers (2011, December)

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example ... [more ▼]

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem featuring multiple interacting imperfect decision makers. Elaborate dynamics, partial observability, as well as a rapidly diverging action space render rational decision making somehow elusive. Humans deal with the complexity using several abstraction layers, taking decisions on different abstract levels. Current agents, on the other hand, remain largely scripted and exhibit static behavior, leaving them extremely vulnerable to flaw abuse and no match against human players. In this paper, we propose to mimic the abstraction mechanisms used by human players for designing AI for RTS games. A non-learning agent for StarCraft showing promising performance is proposed, and several research directions towards the integration of learning mechanisms are discussed at the end of the paper. [less ▲]

Detailed reference viewed: 157 (9 ULg)