强化学习及其在电脑围棋中的应用

被引:28
作者
陈兴国 [1 ,2 ]
俞扬 [2 ]
机构
[1] 南京邮电大学计算机学院/软件学院
[2] 南京大学计算机软件新技术国家重点实验室
关键词
强化学习; 函数近似; 核方法; 神经网络; 加性模型; 深度强化学习;
D O I
10.16383/j.aas.2016.y000003
中图分类号
TP181 [自动推理、机器学习];
学科分类号
摘要
强化学习是一类特殊的机器学习,通过与所在环境的自主交互来学习决策策略,使得策略收到的长期累积奖赏最大.最近,在围棋和电子游戏等领域,强化学习被成功用于取得人类水平的操作能力,受到了广泛关注.本文将对强化学习进行简要介绍,重点介绍基于函数近似的强化学习方法,以及在围棋等领域中的应用.
引用
收藏
页码:685 / 695
页数:11
相关论文
共 23 条
[11]  
Recent Advances in Hierarchical Reinforcement Learning[J] . Andrew G. Barto,Sridhar Mahadevan.Discrete Event Dynamic Systems . 2003 (4)
[12]  
Kernel-Based Reinforcement Learning[J] . Machine Learning . 2002 (2)
[13]  
Technical Update: Least-Squares Temporal Difference Learning[J] . Justin A. Boyan.Machine Learning . 2002 (2)
[14]   Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning [J].
Morimoto, J ;
Doya, K .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2001, 36 (01) :37-51
[15]   Convergence results for single-step on-policy reinforcement-learning algorithms [J].
Singh, S ;
Jaakkola, T ;
Littman, ML ;
Szepesvári, C .
MACHINE LEARNING, 2000, 38 (03) :287-308
[16]  
Linear Least-Squares algorithms for temporal difference learning[J] . Steven J. Bradtke,Andrew G. Barto.Machine Learning . 1996 (1)
[17]   TEMPORAL DIFFERENCE LEARNING AND TD-GAMMON [J].
TESAURO, G .
COMMUNICATIONS OF THE ACM, 1995, 38 (03) :58-68
[18]   ON THE CONVERGENCE OF STOCHASTIC ITERATIVE DYNAMIC-PROGRAMMING ALGORITHMS [J].
JAAKKOLA, T ;
JORDAN, MI ;
SINGH, SP .
NEURAL COMPUTATION, 1994, 6 (06) :1185-1201
[19]  
TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play[J] . Gerald Tesauro.Neural Computation . 1994 (2)
[20]   ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING [J].
TSITSIKLIS, JN .
MACHINE LEARNING, 1994, 16 (03) :185-202