References of "Murphy, Susan A"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailApprentissage par renforcement batch fondé sur la reconstruction de trajectoires artificielles
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Proceedings of the 9èmes Journées Francophones de Planification, Décision et Apprentissage (JFPDA 2014) (2014)

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant ... [more ▼]

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant un critère donné. On considère plus spécifiquement les problèmes pour lesquels l’espace d’état est continu, problèmes pour lesquels les schémas de résolution classiques se fondent sur l’utilisation d’approxima- teurs de fonctions. Cet article propose une alternative fondée sur la reconstruction de “trajectoires arti- ficielles” permettant d’aborder sous un angle nouveau les problèmes classiques de l’apprentissage par renforcement batch. [less ▲]

Detailed reference viewed: 23 (5 ULg)
Full Text
Peer Reviewed
See detailBatch mode reinforcement learning based on the synthesis of artificial trajectories
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Annals of Operations Research (2013), 208(1), 383-416

Detailed reference viewed: 46 (18 ULg)
Full Text
Peer Reviewed
See detailStratégies d'échantillonnage pour l'apprentissage par renforcement batch
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2013), 27(2), 171-194

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some ... [more ▼]

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The first strategy is based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable to falsify the current hypothesis about the optimal control policy. We cast this idea in a context where a policy learning algorithm and a model identification method are given a priori. The second strategy exploits recently published methods for computing bounds on the return of control policies from a set of trajectories in order to sample the state-action space so as to be able to discriminate between optimal and non-optimal policies. Both strategies are experimentally validated, showing promising results. [less ▲]

Detailed reference viewed: 34 (7 ULg)
Full Text
Peer Reviewed
See detailEstimation Monte Carlo sans modèle de politiques de décision
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Revue d'Intelligence Artificielle [=RIA] (2011), 25

Detailed reference viewed: 20 (4 ULg)
Full Text
Peer Reviewed
See detailApprentissage actif par modification de la politique de décision courante
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in Sixièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2011) (2011, June)

Detailed reference viewed: 12 (5 ULg)
Full Text
Peer Reviewed
See detailModel-free Monte Carlo-like policy evaluation
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 29th Benelux Meeting on Systems and Control (2010)

Detailed reference viewed: 9 (0 ULg)
Full Text
See detailComputing bounds for kernel-based policy evaluation in reinforcement learning
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

Report (2010)

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and ... [more ▼]

This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework. [less ▲]

Detailed reference viewed: 14 (3 ULg)
Full Text
Peer Reviewed
See detailInferring bounds on the performance of a control policy from a sample of one-step system transitions
Fonteneau, Raphaël ULg; Murphy, Susan A.; Wehenkel, Louis ULg et al

in 28th Benelux Meeting on Systems and Control (2009)

Detailed reference viewed: 10 (4 ULg)