Learning tetris using the noisy cross-entropy method

被引:104
作者
Szita, Istvan [1 ]
Lorincz, Andras [1 ]
机构
[1] Eotvos Lorand Univ, Dept Informat Syst, H-1117 Budapest, Hungary
关键词
D O I
10.1162/neco.2006.18.12.2936
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.
引用
收藏
页码:2936 / 2941
页数:6
相关论文
共 11 条
[1]  
[Anonymous], 2004, P ICML 04 WORKSHOP R
[2]  
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[3]   A tutorial on the cross-entropy method [J].
De Boer, PT ;
Kroese, DP ;
Mannor, S ;
Rubinstein, RY .
ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) :19-67
[4]  
Demaine ED, 2003, LECT NOTES COMPUT SC, V2697, P351
[5]  
FAHEY CP, 2003, TETRIS AI
[6]  
Farias VF, 2006, PROBABILISTIC AND RANDOMIZED METHODS FOR DESIGN UNDER UNCERTAINTY, P189, DOI 10.1007/1-84628-095-8_6
[7]  
Kakade S, 2002, ADV NEUR IN, V14, P1531
[8]  
LAGOUDAKIS MG, 2002, SETN, P249
[9]  
Mandl S., 2004, LWA, P118
[10]  
Mannor S., 2003, Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML'03, P512