On the convergence of reinforcement learning

被引:115
作者
Beggs, AW [1 ]
机构
[1] Univ Oxford Wadham Coll, Oxford OX1 3PN, England
关键词
reinforcement learning; games;
D O I
10.1016/j.jet.2004.03.008
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper examines the convergence of payoffs and strategies in Erev and Roth's model of reinforcement learning. When all players use this rule it eliminates iteratively dominated strategies and in two-person constant-sum games average payoffs converge to the value of the game. Strategies converge in constant-sum games with unique equilibria if they are pure or if they are mixed and the game is 2 x 2. The long-run behaviour of the learning rule is governed by equations related to Maynard Smith's version of the replicator dynamic. Properties of the learning rule against general opponents are also studied. (c) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:1 / 36
页数:36
相关论文
共 39 条
[1]  
[Anonymous], [No title captured], DOI DOI 10.1007/BF01199986
[2]  
AUER P, 1998, UNPUB GAMBLING RIGGE
[3]  
Benaïm M, 1999, LECT NOTES MATH, V1709, P1
[4]   Mixed equilibria and dynamical systems arising from fictitious play in perturbed games [J].
Benaïm, M ;
Hirsch, MW .
GAMES AND ECONOMIC BEHAVIOR, 1999, 29 (1-2) :36-72
[5]  
Benveniste A, 1990, Adaptive algorithms and stochastic approximations
[6]  
Bjornerstedt J, 1996, RATIONAL FOUNDATIONS OF ECONOMIC BEHAVIOUR, P155
[7]   Learning through reinforcement and replicator dynamics [J].
Borgers, T ;
Sarin, R .
JOURNAL OF ECONOMIC THEORY, 1997, 77 (01) :1-14
[8]  
Brandiere O, 1996, ANN I H POINCARE-PR, V32, P395
[9]   Experience-weighted attraction learning in normal form games [J].
Camerer, C ;
Ho, TH .
ECONOMETRICA, 1999, 67 (04) :827-874
[10]   Convergence rate of stochastic approximation algorithms in the degenerate case [J].
Chen, HF .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1998, 36 (01) :100-114