Programming backgammon using self-teaching neural nets

被引：77

作者：

Tesauro, G ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Hawthorne, NY 10532 USA

来源：

ARTIFICIAL INTELLIGENCE | 2002年 / 134卷 / 1-2期

关键词：

reinforcement learning; temporal difference learning; neural networks; backgammon; games; doubling strategy; rollouts;

D O I：

10.1016/S0004-3702(01)00110-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. Starting from random initial play, TD-Gammon's self-teaching methodology results in a surprisingly strong program: without lookahead, its positional judgement rivals that of human experts, and when combined with shallow lookahead, it reaches a level of play that surpasses even the best human players. The success of TD-Gammon has also been replicated by several other programmers; at least two other neural net programs also appear to be capable of superhuman play. Previous papers on TD-Gammon have focused on developing a scientific understanding of its reinforcement learning methodology. This paper views machine learning as a tool in a programmer's toolkit, and considers how it can be combined with other programming techniques to achieve and surpass world-class backgammon play. Particular emphasis is placed on programming shallow-depth search algorithms, and on TD-Gammon's doubling algorithm, which is described in print here for the first time. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：181 / 199

页数：19

共 32 条

[1]

Allis L. V, 1988, THESIS FREE U AMSTER

[2]

Baxter J., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P28

[3] COMPUTER BACKGAMMON [J].

BERLINER, H .

SCIENTIFIC AMERICAN, 1980, 242 (06) :64-&

[4] Efficient approximation of backgammon race equities [J].

Buro, M .

ICCA JOURNAL, 1999, 22 (03) :133-142

[5]

Crites RH, 1996, ADV NEUR IN, V8, P1017

[6] MULTILAYER FEEDFORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS [J].

HORNIK, K ;

STINCHCOMBE, M ;

WHITE, H .

NEURAL NETWORKS, 1989, 2 (05) :359-366

[7]

JACOBY O, 1970, BACKGAMMON BOOK

[8]

JANOWSKI R, 1993, TAKE POINTS MONEY GA

[9] OPTIMAL DOUBLING IN BACKGAMMON [J].

KEELER, EB ;

SPENCER, J .

OPERATIONS RESEARCH, 1975, 23 (06) :1063-1071

[10]

MAGRIEL P, 1976, BACKGAMMON

← 1 2 3 4 →