Learning tetris using the noisy cross-entropy method

被引：104

作者：

Szita, Istvan ^{[1
]}

Lorincz, Andras ^{[1
]}

机构：

[1] Eotvos Lorand Univ, Dept Informat Syst, H-1117 Budapest, Hungary

来源：

NEURAL COMPUTATION | 2006年 / 18卷 / 12期

关键词：

D O I：

10.1162/neco.2006.18.12.2936

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.

引用

页码：2936 / 2941

页数：6

共 11 条

[1]

[Anonymous], 2004, P ICML 04 WORKSHOP R

[2]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[3] A tutorial on the cross-entropy method [J].

De Boer, PT ;

Kroese, DP ;

Mannor, S ;

Rubinstein, RY .

ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) :19-67

[4]

Demaine ED, 2003, LECT NOTES COMPUT SC, V2697, P351

[5]

FAHEY CP, 2003, TETRIS AI

[6]

Farias VF, 2006, PROBABILISTIC AND RANDOMIZED METHODS FOR DESIGN UNDER UNCERTAINTY, P189, DOI 10.1007/1-84628-095-8_6

[7]

Kakade S, 2002, ADV NEUR IN, V14, P1531

[8]

LAGOUDAKIS MG, 2002, SETN, P249

[9]

Mandl S., 2004, LWA, P118

[10]

Mannor S., 2003, Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML'03, P512

← 1 2 →