Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees

Defourny, Boris; Ernst, Damien; Wehenkel, Louis

doi:10.1007/978-3-540-89722-4_1

Download

Paper published in a book (Scientific congresses and symposiums)

Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees

Defourny, Boris; Ernst, Damien; Wehenkel, Louis

2008 • In Defourny, Boris; Ernst, Damien; Wehenkel, Louis (Eds.) Recent Advances in Reinforcement Learning

Peer reviewed

Permalink
https://hdl.handle.net/2268/15470

DOI
10.1007/978-3-540-89722-4_1

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

disturbanceTree.pdf

Publisher postprint (177.63 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

stochastic dynamic programming; ensemble methods

Abstract :

[en] This paper addresses the problem of solving discrete-time optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x0 an optimal decision strategy can be stated as an optimization problem which aims at finding an optimal combination of decisions attached to the nodes of a disturbance tree modeling all possible sequences of disturbances w0, w1, . . ., w(T−1) in W^T over the optimization horizon T. A significant drawback of this approach is that the resulting optimization problem has a search space which is the Cartesian product of O(|W|^(T−1)) decision spaces U, which makes the approach computationally impractical as soon as the optimization horizon grows, even if W has just a handful of elements. To circumvent this difficulty, we propose to exploit an ensemble of randomly generated incomplete disturbance trees of controlled complexity, to solve their induced optimization problems in parallel, and to combine their predictions at time t = 0 to obtain a (near-)optimal first-stage decision. Because this approach postpones the determination of the decisions for subsequent stages until additional information about the realization of the uncertain process becomes available, we call it lazy. Simulations carried out on a robot corridor navigation problem show that even for small incomplete trees, this approach can lead to near-optimal decisions.

Disciplines :

Computer science

Author, co-author :

Defourny, Boris ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Lazy planning under uncertainty by optimizing decisions on an ensemble of incomplete disturbance trees

Publication date :

2008

Event name :

8th European Workshop on Reinforcement Learning (EWRL'08)

Event place :

Villeneuve d'Ascq, France

Event date :

30 June - 3 July

Audience :

International

Main work title :

Recent Advances in Reinforcement Learning

Author, co-author :

Defourny, Boris ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore)

Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

ISBN/EAN :

978-3-540-89721-7

Collection name :

Lecture Notes in Artificial Intelligence, Vol. 5323

Pages :

1-14

Peer reviewed :

Peer reviewed

Available on ORBi :

since 27 June 2009

Statistics

Number of views

87 (4 by ULiège)

Number of downloads

162 (2 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

Maciejowski, J.: Predictive Control with Constraints. Prentice Hall, Englewood Cliffs (2001)
Morari, M., Lee, J.: Model predictive control: past, present and future. Computers and Chemical Engineering 23, 667-682 (1999)
Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (1997)
Launchbury, J.: A natural semantics for lazy evaluation. In: POPL 1993: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 144-154. ACM, New York (1993)
Friedman, J., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proc. of 13th National Conference on Artificial Intelligence, AAAI 1996. Part l(of 2), pp. 717-724 (1996)
Heitsch, H., Römisch, W., Strugarek, C.: Stability of multistage stochastic programs. SIAM Journal on Optimization 17(2), 511-525 (2006)
Römisch, W.: Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, pp. 483-554. Elsevier, Amsterdam (2003)
Dempster, M.: Sequential importance sampling algorithms for dynamic stochastic programming. Annals of Operations Research 84, 153-184 (1998)
Shapiro, A.: Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, pp. 353-425. Elsevier, Amsterdam (2003)
Høyland, K., Wallace, S.: Generating scenario trees for multistage decision problems. Management Science 47(2), 295-307 (2001)
Hochreiter, R., Pflug, G.: Financial scenario generation for stochastic multi-stage decision processes as facility location problems. Annals of Operations Research 152, 257-272 (2007)
Rachev, S., Römisch, W.: Quantitative stability in stochastic programming: The method of probability metrics. Mathematics of Operations Research 27(4), 792-818 (2002)
Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems. Man and Cybernetics - Part B (to appear, 2008)
Kothare, M., Balakrishnan, V., Morari, M.: Robust constrained model predictive control using matrix inequalities. Automatica 32, 1361-1379 (1996)
Nesterov, Y., Vial, J.P.: Confidence level solutions for stochastic programming. Automatica 44(6), 1559-1568 (2008)
Schapire, R.: The strength of weak learnability. Machine Learning 5(2), 197-227 (1990)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123-140 (1996)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503-556 (2005)
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8, 1038-1044 (1996)
Kearns, M., Mansour, Y., Ng, A.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49(2-3), 193-208 (2002)
Rubinstein, R., Kroese, D.: The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. In: Information Science and Statistics. Springer, Heidelberg (2004)
Cassandra. A., Kaelbling, L., Littman, M.: Acting optimally in partially observable stochastic domains. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), Seattle, Washington, USA, vol. 2, pp. 1023-1028. AAAI Press/MIT Press, Menlo Park (1994)
Ng, A., Jordan. M.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406-415 (1999)
Defourny, B.: Approximate solution to multistage stochastic programs with ensembles of randomized scenario trees. Master's thesis, University of Liège, Department of Electrical Engineering and Computer Science (2007)
Defourny, B., Wehenkel, L.: Averaging decisions from an ensemble of scenario trees: a validation on newsvendor problems (submitted, 2008)
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Sutton, R., McAUester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12, 1057-1063 (2000)