SELF-IMPROVING REACTIVE AGENTS BASED ON REINFORCEMENT LEARNING, PLANNING AND TEACHING

被引:921
作者
LIN, LJ
机构
关键词
REINFORCEMENT LEARNING; PLANNING; TEACHING; CONNECTIONIST NETWORKS;
D O I
10.1007/BF00992699
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus twofold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
引用
收藏
页码:293 / 321
页数:29
相关论文
共 30 条
[1]  
ANDERSON CW, 1987, 4TH P INT WORKSH MAC, P103
[2]  
Barto A., 1990, LEARNING COMPUTATION
[3]  
BARTO AG, 1991, 9157 U MASS COMP SCI
[4]  
CHAPMAN D, 1991, P IJCAI 91
[5]   THE CONVERGENCE OF TD(LAMBDA) FOR GENERAL LAMBDA [J].
DAYAN, P .
MACHINE LEARNING, 1992, 8 (3-4) :341-362
[6]   LEARNING SEQUENTIAL DECISION RULES USING SIMULATION-MODELS AND COMPETITION [J].
GREFENSTETTE, JJ ;
RAMSEY, CL ;
SCHULTZ, AC .
MACHINE LEARNING, 1990, 5 (04) :355-381
[7]  
Howard RonaldA., 1960, DYNAMIC PROGRAMMING
[8]  
KAELBLING L, 1990, THESIS STANFORD U
[9]  
LANG KJ, 1989, THESIS CARNEGIEMELLO
[10]  
LIN JL, 1991, P AAAI 91, P781