Learning to play chess using temporal differences

被引：73

作者：

Baxter, J ^{[1
]}

Tridgell, A

Weaver, L

机构：

[1] Australian Natl Univ, Dept Syst Engn, Canberra, ACT 0200, Australia

[2] Australian Natl Univ, Dept Comp Sci, Canberra, ACT 0200, Australia

来源：

MACHINE LEARNING | 2000年 / 40卷 / 03期

关键词：

temporal difference learning; neural network; TDLEAF; chess; backgammon;

D O I：

10.1023/A:1007634325138

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we present TDLEAF(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program "KnightCap" used TDLEAF(lambda) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(lambda) can yield better results in the domain of backgammon, where TD(lambda) has previously yielded striking success.

引用

页码：243 / 263

页数：21

共 16 条

[1]

BEAL DF, 1997, J INT COMPUTER CHESS

[2]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[3]

Marsland T. Anthony, 1990, COMPUTERS CHESS COGN

[4] Best-first fixed-depth minimax algorithms [J].

Plaat, A ;

Schaeffer, J ;

Pijls, W ;

deBruin, A .

ARTIFICIAL INTELLIGENCE, 1996, 87 (1-2) :255-293

[5]

Pollack J., 1996, P 5 ART LIF C NAR JA

[6]

SAMUEL AL, 1959, IBM J RES DEV, V3, P210

[7] THE HISTORY HEURISTIC AND ALPHA-BETA SEARCH ENHANCEMENTS IN PRACTICE [J].

SCHAEFFER, J .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (11) :1203-1212

[8]

SCHRAUDOLPH N, 1994, ADV NEURAL INFORMATI, V6

[9]

Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479

[10]

Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447

← 1 2 →