Publications of ???
Bookmark and Share    
Full Text
See detailApproximate Bayes Optimal Policy Search using Neural Networks
Castronovo, Michaël ULiege; François-Lavet, Vincent ULiege; Fonteneau, Raphaël ULiege et al

in Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017) (2017, February)

Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State ... [more ▼]

Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State-of-the-art BRL agents rely on frequent updates of the belief on the MDP, as new observations of the environment are made. This offers theoretical guarantees to converge to an optimum, but is computationally intractable, even on small-scale problems. In this paper, we present a method that circumvents this issue by training a parametric policy able to recommend an action directly from raw observations. Artificial Neural Networks (ANNs) are used to represent this policy, and are trained on the trajectories sampled from the prior. The trained model is then used online, and is able to act on the real MDP at a very low computational cost. Our new algorithm shows strong empirical performance, on a wide range of test problems, and is robust to inaccuracies of the prior distribution. [less ▲]

Detailed reference viewed: 587 (35 ULiège)
Full Text
See detailBenchmarking for Bayesian Reinforcement Learning
Castronovo, Michaël ULiege; Ernst, Damien ULiege; Couëtoux, Adrien ULiege et al

in PLoS ONE (2016)

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the col- lected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand ... [more ▼]

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the col- lected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but even though a few toy examples exist in the literature, there are still no extensive or rigorous benchmarks to compare them. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test prob- lems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed. [less ▲]

Detailed reference viewed: 605 (52 ULiège)