References of "Fonteneau, Raphaël"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailModel-free Monte Carlo–like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of Conférence Francophone sur l'Apprentissage Automatique (CAp) 2010 (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 22 (10 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010) (2010, May)

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards ... [more ▼]

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions. [less ▲]

Detailed reference viewed: 66 (16 ULg)
Full Text
Peer Reviewed
See detailGenerating informative trajectories by using bounds on the return of control policies
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010) (2010, May)

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for ... [more ▼]

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories. [less ▲]

Detailed reference viewed: 32 (12 ULg)
Full Text
Peer Reviewed
See detailA cautious approach to generalization in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the 2nd International Conference on Agents and Artificial Intelligence (2010, January)

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity ... [more ▼]

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop. [less ▲]

Detailed reference viewed: 74 (22 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 7 (0 ULg)
Full Text
See detailComputing bounds for kernel-based policy evaluation in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

Report (2010)

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and ... [more ▼]

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework. [less ▲]

Detailed reference viewed: 13 (3 ULg)
Full Text
See detailVoronoi model learning for batch mode reinforcement learning
Fonteneau, Raphaël ULg; Ernst, Damien ULg

Report (2010)

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each ... [more ▼]

We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system transition gathers a state, the action taken while being in this state, the immediate reward observed and the next state reached. In such a context, we propose a new model learning--type reinforcement learning (RL) algorithm in batch mode, finite-time and deterministic setting. The algorithm, named Voronoi reinforcement learning (VRL), approximates from a sample of system transitions the system dynamics and the reward function of the optimal control problem using piecewise constant functions on a Voronoi--like partition of the state-action space. [less ▲]

Detailed reference viewed: 20 (3 ULg)
Full Text
Peer Reviewed
See detailApoptosis characterizes immunological failure of HIV infected patients
Mhawej, Marie-José; Brunet-Francois, Cécile; Fonteneau, Raphaël ULg et al

in Control Engineering Practice (2009), 17(7), 798-804

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are ... [more ▼]

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are identified by using clinical data generated by monitoring patients starting Highly Active Anti-Retroviral Therapy (HAART). The sampling of blood tests is performed to satisfy the constraints of dynamical system parameter identification. The apoptosis parameter, which is inferred from clinical data, is then shown to play a key role in the early diagnosis of immunological failure. [less ▲]

Detailed reference viewed: 73 (18 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of one-step system transitions
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 28th Benelux Meeting on Systems and Control (2009)

Detailed reference viewed: 10 (4 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of trajectories
Fonteneau, Raphaël ULg; Murphy, Susan; Wehenkel, Louis ULg et al

in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09) (2009)

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this ... [more ▼]

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper, the dynamics, control policy, and reward function are supposed to be deterministic and Lipschitz continuous. Under these assumptions, a polynomial algorithm, in terms of the sample size and length of the optimization horizon, is derived to compute these bounds, and their tightness is characterized in terms of the sample density. [less ▲]

Detailed reference viewed: 35 (10 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes: a reinforcement learning approach
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in The annual machine learning conference of Belgium and the Netherlands (BeNeLearn 2008) (2008, May)

Detailed reference viewed: 12 (1 ULg)
Full Text
Peer Reviewed
See detailModelling the influence of activation-induced apoptosis of CD4+ and CD8+ T-cells on the immune system response of a HIV-infected patient
Stan, Guy-Bart; Belmudes, Florence ULg; Fonteneau, Raphaël ULg et al

in IET Systems Biology (2008), 2(2), 94-102

On the basis of the human immunodeficiency virus (HIV) infection dynamics model proposed by Adams, the authors propose an extended model that aims at incorporating the influence of activation-induced ... [more ▼]

On the basis of the human immunodeficiency virus (HIV) infection dynamics model proposed by Adams, the authors propose an extended model that aims at incorporating the influence of activation-induced apoptosis of CD4+ and CD8+ T-cells on the immune system response of HIV-infected patients. Through this model, the authors study the influence of this phenomenon on the time evolution of specific cell populations such as plasma concentrations of HIV copies, or blood concentrations of CD4+ and CD8+ T-cells. In particular, this study shows that depending on its intensity, the apoptosis phenomenon can either favour or mitigate the long-term evolution of the HIV infection. [less ▲]

Detailed reference viewed: 63 (12 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

in 27th Benelux Meeting on Systems and Control (2008)

Detailed reference viewed: 5 (0 ULg)
Full Text
Peer Reviewed
See detailVariable selection for dynamic treatment regimes: a reinforcement learning approach
Fonteneau, Raphaël ULg; Wehenkel, Louis ULg; Ernst, Damien ULg

Conference (2008)

Dynamic treatment regimes (DTRs) can be inferred from data collected through some randomized clinical trials by using reinforcement learning algorithms. During these clinical trials, a large set of ... [more ▼]

Dynamic treatment regimes (DTRs) can be inferred from data collected through some randomized clinical trials by using reinforcement learning algorithms. During these clinical trials, a large set of clinical indicators are usually monitored. However, it is often more convenient for clinicians to have DTRs which are only defined on a small set of indicators rather than on the original full set. To address this problem, we analyse the approximation architecture of the state-action value functions computed by the fitted Q iteration algorithm - a RL algorithm - using tree-based regressors in order to identify a small subset of relevant ones. The RL algorithm is then rerun by considering only as state variables these most relevant indicators to have DTRs defined on a small set of indicators. The approach is validated on benchmark problems inspired from the classical ‘car on the hill’ problem and the results obtained are positive. [less ▲]

Detailed reference viewed: 64 (7 ULg)