References of "Fonteneau, Raphaël"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailEstimation Monte Carlo sans modèle de politiques de décision
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2011), 25

Detailed reference viewed: 20 (4 ULg)
Full Text
Peer Reviewed
See detailApprentissage actif par modification de la politique de décision courante
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011) (2011, June)

Detailed reference viewed: 12 (5 ULg)
Full Text
Peer Reviewed
See detailActive exploration by searching for experiments that falsify the computed control policy
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) (2011, April)

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most ... [more ▼]

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. Experiments are selected if, using the learnt environment model, they are predicted to yield a revision of the learnt control policy. Algorithms and simulation results are provided for a deterministic system with discrete action space. They show that the proposed approach is promising. [less ▲]

Detailed reference viewed: 26 (8 ULg)
Full Text
See detailContributions to Batch Mode Reinforcement Learning
Fonteneau, Raphaël ULg

Doctoral thesis (2011)

This dissertation presents various research contributions published during these four years of PhD in the field of batch mode reinforcement learning, which studies optimal control problems for which the ... [more ▼]

This dissertation presents various research contributions published during these four years of PhD in the field of batch mode reinforcement learning, which studies optimal control problems for which the only information available on the system dynamics and the reward function is gathered in a set of trajectories. We first focus on deterministic problems in continuous spaces. In such a context, and under some assumptions related to the smoothness of the environment, we propose a new approach for inferring bounds on the performance of control policies. We also derive from these bounds a new inference algorithm for generalizing the information contained in the batch collection of trajectories in a cautious manner. This inference algorithm as itself lead us to propose a min max generalization framework. When working on batch mode reinforcement learning problems, one has also often to consider the problem of generating informative trajectories. This dissertation proposes two different approaches for addressing this problem. The first approach uses the bounds mentioned above to generate data tightening these bounds. The second approach proposes to generate data that are predicted to generate a change in the inferred optimal control policy. While the above mentioned contributions consider a deterministic framework, we also report on two research contributions which consider a stochastic setting. The first one addresses the problem of evaluating the expected return of control policies in the presence of disturbances. The second one proposes a technique for selecting relevant variables in a batch mode reinforcement learning context, in order to compute simplified control policies that are based on smaller sets of state variables. [less ▲]

Detailed reference viewed: 84 (15 ULg)
Full Text
Peer Reviewed
See detailTowards min max generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Filipe, Joaquim; Fred, Ana; Sharp, Bernadette (Eds.) Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers (2011)

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst ... [more ▼]

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst return that could possibly be obtained considering any dynamics and reward function compatible with the sample of trajectories and some prior knowledge on the environment. We consider the particular case of deterministic Lipschitz continuous environments over continuous state spaces, nite action spaces, and a nite optimization horizon. We discuss the non-triviality of computing an exact solution of the min max problem even after reformulating it so as to avoid search in function spaces. For addressing this problem, we propose to replace, inside this min max problem, the search for the worst environment given a sequence of actions by an expression that lower bounds the worst return that can be obtained for a given sequence of actions. This lower bound has a tightness that depends on the sample sparsity. From there, we propose an algorithm of polynomial complexity that returns a sequence of actions leading to the maximization of this lower bound. We give a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. Our experiments show that this algorithm can lead to more cautious policies than algorithms combining dynamic programming with function approximators. [less ▲]

Detailed reference viewed: 27 (4 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo–like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of Conférence Francophone sur l'Apprentissage Automatique (CAp) 2010 (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 23 (10 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010) (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 68 (16 ULg)
Full Text
Peer Reviewed
See detailGenerating informative trajectories by using bounds on the return of control policies
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010) (2010, May)

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for ... [more ▼]

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories. [less ▲]

Detailed reference viewed: 33 (12 ULg)
Full Text
Peer Reviewed
See detailA cautious approach to generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (2010, January)

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity ... [more ▼]

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. [less ▲]

Detailed reference viewed: 83 (22 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 9 (0 ULg)
Full Text
See detailComputing bounds for kernel-based policy evaluation in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

Report (2010)

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and ... [more ▼]

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework. [less ▲]

Detailed reference viewed: 14 (3 ULg)
Full Text
See detailVoronoi model learning for batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg

Report (2010)

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each ... [more ▼]

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system transition gathers a state, the action taken while being in this state, the immediate reward observed and the next state reached. In such a context, we propose a new model learning--type reinforcement learning (RL) algorithm in batch mode, finite-time and deterministic setting. The algorithm, named Voronoi reinforcement learning (VRL), approximates from a sample of system transitions the system dynamics and the reward function of the optimal control problem using piecewise constant functions on a Voronoi--like partition of the state-action space. [less ▲]

Detailed reference viewed: 24 (3 ULg)
Full Text
Peer Reviewed
See detailApoptosis characterizes immunological failure of HIV infected patients
Mhawej, Marie-José; Brunet-Francois, Cécile; Fonteneau, Raphaël ULg et al

in Control Engineering Practice (2009), 17(7), 798-804

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are ... [more ▼]

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are identified by using clinical data generated by monitoring patients starting Highly Active Anti-Retroviral Therapy (HAART). The sampling of blood tests is performed to satisfy the constraints of dynamical system parameter identification. The apoptosis parameter, which is inferred from clinical data, is then shown to play a key role in the early diagnosis of immunological failure. [less ▲]

Detailed reference viewed: 76 (19 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of one-step system transitions
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 28th Benelux Meeting on Systems and Control (2009)

Detailed reference viewed: 10 (4 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of trajectories
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this ... [more ▼]

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density. [less ▲]

Detailed reference viewed: 36 (10 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes: a reinforcement learning approach
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in The annual machine learning conference of Belgium and the Netherlands (BeNeLearn 2008) (2008, May)

Detailed reference viewed: 14 (1 ULg)
Full Text
Peer Reviewed
See detailModelling the influence of activation-induced apoptosis of CD4+ and CD8+ T-cells on the immune system response of a HIV-infected patient
Stan, Guy-Bart; Belmudes, Florence ULg; Fonteneau, Raphaël ULg et al

in IET Systems Biology (2008), 2(2), 94-102

On the basis of the human immunodeficiency virus (HIV) infection dynamics model proposed by Adams, the authors propose an extended model that aims at incorporating the influence of activation-induced ... [more ▼]

On the basis of the human immunodeficiency virus (HIV) infection dynamics model proposed by Adams, the authors propose an extended model that aims at incorporating the influence of activation-induced apoptosis of CD4+ and CD8+ T-cells on the immune system response of HIV-infected patients. Through this model, the authors study the influence of this phenomenon on the time evolution of specific cell populations such as plasma concentrations of HIV copies, or blood concentrations of CD4+ and CD8+ T-cells. In particular, this study shows that depending on its intensity, the apoptosis phenomenon can either favour or mitigate the long-term evolution of the HIV infection. [less ▲]

Detailed reference viewed: 69 (12 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in 27th Benelux Meeting on Systems and Control (2008)

Detailed reference viewed: 6 (0 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes: a reinforcement learning approach
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

Conference (2008)

Dynamic treatment regimes (DTRs) can be inferred from data collected through some randomized clinical trials by using reinforcement learning algorithms. During these clinical trials, a large set of ... [more ▼]

Dynamic treatment regimes (DTRs) can be inferred from data collected through some randomized clinical trials by using reinforcement learning algorithms. During these clinical trials, a large set of clinical indicators are usually monitored. However, it is often more convenient for clinicians to have DTRs which are only defined on a small set of indicators rather than on the original full set. To address this problem, we analyse the approximation architecture of the state-action value functions computed by the fitted Q iteration algorithm - a RL algorithm - using tree-based regressors in order to identify a small subset of relevant ones. The RL algorithm is then rerun by considering only as state variables these most relevant indicators to have DTRs defined on a small set of indicators. The approach is validated on benchmark problems inspired from the classical ‘car on the hill’ problem and the results obtained are positive. [less ▲]

Detailed reference viewed: 79 (7 ULg)