Mastering the game of Go with deep neural networks and tree search

被引：10262

作者：

Silver, David ^{[1
]}

Huang, Aja ^{[1
]}

Maddison, Chris J. ^{[1
]}

Guez, Arthur ^{[1
]}

Sifre, Laurent ^{[1
]}

van den Driessche, George ^{[1
]}

Schrittwieser, Julian ^{[1
]}

Antonoglou, Ioannis ^{[1
]}

Panneershelvam, Veda ^{[1
]}

Lanctot, Marc ^{[1
]}

Dieleman, Sander ^{[1
]}

Grewe, Dominik ^{[1
]}

Nham, John ^{[2
]}

Kalchbrenner, Nal ^{[1
]}

Sutskever, Ilya ^{[2
]}

Lillicrap, Timothy ^{[1
]}

Leach, Madeleine ^{[1
]}

Kavukcuoglu, Koray ^{[1
]}

Graepel, Thore ^{[1
]}

Hassabis, Demis ^{[1
]}

机构：

[1] Google DeepMind, 5 New St Sq, London EC4A 3TW, England

[2] Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA

来源：

NATURE | 2016年 / 529卷 / 7587期

关键词：

COMPUTER; CHESS;

D O I：

10.1038/nature16961

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

引用

页码：484 / +

页数：20

共 61 条

[1] Allis L. V., 1994, THESIS
[2] [Anonymous], 2006, 6062 INRIA
[3] [Anonymous], 1994, Adv. Neural Inf. Process. Syst
[4] The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go
Baier, Hendrik
Drake, Peter D.
[J]. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2010, 2 (04) : 303 - 309
[5] Baier Hendrik., 2011, Proc. 23rd Benelux Conference on Artificial Intelligence, P3
[6] Baudis Petr, 2012, Advances in Computer Games. 13th International Conference, ACG 2011. Revised Selected Papers, P24, DOI 10.1007/978-3-642-31866-5_3
[7] BALANCING MCTS BY DYNAMICALLY ADJUSTING THE KOMI VALUE
Baudis, Petr
[J]. ICGA JOURNAL, 2011, 34 (03) : 131 - 139
[8] Learning to play chess using temporal differences
Baxter, J
Tridgell, A
Weaver, L
[J]. MACHINE LEARNING, 2000, 40 (03) : 243 - 263
[9] CHRONOLOGY OF COMPUTER CHESS AND ITS LITERATURE
BERLINER, HJ
[J]. ARTIFICIAL INTELLIGENCE, 1978, 10 (02) : 201 - 214
[10] Bouzy Bruno., 2003, ACG VOLUME 263 IFIP, V263, P159

← 1 2 3 4 5 6 7 →