Relaxation schemes for min max generalization in deterministic batch mode reinforcement learning
English
Fonteneau, Raphaël[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
Boigelot, Bernard[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Informatique >]
Louveaux, Quentin[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Système et modélisation : Optimisation discrète >]
Dec-2011
No
International
4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011)
December 16th, 2011
Sierra Nevada
Spain
[en] Batch mode reinforcement learning ; Min max generalization ; Non-convex optimization
[en] We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two-stage case for which we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. Both relaxation schemes are shown to provide better results than those given in [Fonteneau, 2011].
Fonds de la Recherche Scientifique (Communauté française de Belgique) - F.R.S.-FNRS