Bellman R. Dynamic Programming, Princeton University Press; 1957.
Cassandra A.R., Kaelbling L.P., Littman M.L. (1994) Acting optimally in partially observable stochastic domains. Proceedings of the Twelfth National Conference on Artificial Intelligence.
Ernst D., Wehenkel L. FACTS devices controlled by means of reinforcement learning algorithms. Power System Computation Conference. 2002.
Harp S.A., Brignone S., Wollenberg B.F., Samad T. (2000) SEPIA. A simulator for electric power industry agents. IEEE Control Systems Magazine , August; 20(4):53-59.
Moore A.W., Atkeson C.G. (1993) Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13:103-130.
Munos R. (2000) A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning 40:265-299.
Sutton R.S., Barto A.G. Reinforcement Learning, an Introduction, The MIT Press; 1998.