Reference : Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of ...
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/132793
Comparison of Different Selection Strategies in Monte-Carlo Tree Search for the Game of Tron
English
Perrick, Pierre [> >]
Lupien St-Pierre, David mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore) >]
Maes, Francis mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Ernst, Damien mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
2012
IEEE Conference on Computational and Intelligence in Games 2012
242-249
Yes
No
International
IEEE Conference on Computational and Intelligence in Games ( CIG 2012 )
from 10-09-2012 to 14-09-2012
Granada
Spain
[en] Monte Carlo Tree Search ; Selection Policy ; Games
[en] Monte-Carlo Tree Search (MCTS) techniques are essentially known for their performance on turn-based games, such as Go, for which players have considerable time for choosing their moves. In this paper, we apply MCTS to the game of Tron, a simultaneous real-time two-player game. The fact that players have to react fast and that moves occur simultaneously creates an unusual setting for MCTS, in which classical selection policies such as UCB1 may be suboptimal. In this paper, we perform an empirical comparison of a wide range of selection policies for MCTS applied to Tron, with both deterministic policies (UCB1, UCB1-Tuned, UCB-V, UCBMinimal, OMC-Deterministic, MOSS) and stochastic policies (Epsilon-greedy, EXP3, Thompson Sampling, OMC-Stochastic, PBBM). From the experiments, we observe that UCB1-Tuned has the best behavior shortly followed by UCB1. Even if UCB-Minimal is ranked fourth, this is a remarkable result for this recently introduced selection policy found through automatic discovery of good policies on generic multi-armed bandit problems. We also show that deterministic policies perform better than stochastic ones for this problem.
http://hdl.handle.net/2268/132793

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
paper88.pdfPublisher postprint392.67 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.