ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING

被引:391
作者
TSITSIKLIS, JN
机构
[1] Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA
关键词
REINFORCEMENT LEARNING; Q-LEARNING; DYNAMIC PROGRAMMING; STOCHASTIC APPROXIMATION;
D O I
10.1023/A:1022689125041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.
引用
收藏
页码:185 / 202
页数:18
相关论文
共 15 条
[1]  
BARTO AG, 1991, 9157 U MASS COMP SCI
[2]   AN ANALYSIS OF STOCHASTIC SHORTEST-PATH PROBLEMS [J].
BERTSEKAS, DP ;
TSITSIKLIS, JN .
MATHEMATICS OF OPERATIONS RESEARCH, 1991, 16 (03) :580-595
[3]   DISTRIBUTED DYNAMIC-PROGRAMMING [J].
BERTSEKAS, DP .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1982, 27 (03) :610-616
[4]  
Bertsekas DP., 1989, PARALLEL DISTRIBUTED
[5]   THE CONVERGENCE OF TD(LAMBDA) FOR GENERAL LAMBDA [J].
DAYAN, P .
MACHINE LEARNING, 1992, 8 (3-4) :341-362
[6]  
Kushner H. J., 1987, Stochastics, V22, P219, DOI 10.1080/17442508708833475
[7]   ASYMPTOTIC PROPERTIES OF DISTRIBUTED AND COMMUNICATING STOCHASTIC-APPROXIMATION ALGORITHMS [J].
KUSHNER, HJ ;
YIN, G .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1987, 25 (05) :1266-1290
[8]  
LI S, 1987, IEEE T AUTOMAT CONTR, V32, P612
[9]  
MOORE AW, 1992, MEMORY BASED REINFOR
[10]  
Poljak B. T., 1973, AUTOMAT REM CONTR, V12, P83