IMPROVING GENERALIZATION FOR TEMPORAL DIFFERENCE LEARNING - THE SUCCESSOR REPRESENTATION

被引:387
作者
DAYAN, P
机构
关键词
D O I
10.1162/neco.1993.5.4.613
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalization between states is determined by how similar their successors are, and representations should follow suit. This paper shows how TD machinery can be used to learn such representations, and illustrates, using a navigation task, the appropriately distributed nature of the result.
引用
收藏
页码:613 / 624
页数:12
相关论文
共 19 条
[1]  
Albus J. S., 1975, Transactions of the ASME. Series G, Journal of Dynamic Systems, Measurement and Control, V97, P220, DOI 10.1115/1.3426922
[2]  
Anderson C. W., 1986, THESIS U MASSACHUSET
[3]  
[Anonymous], 1989, LEARNING DELAYED REW
[4]  
BARTO AG, 1991, TR9157 U AMH DEP COM
[5]  
BARTO AG, 1989, 8995 U MASS COMP INF
[6]  
CHAPMAN D, 1991, 1991 P INT JOINT C A, P726
[7]   THE CONVERGENCE OF TD(LAMBDA) FOR GENERAL LAMBDA [J].
DAYAN, P .
MACHINE LEARNING, 1992, 8 (3-4) :341-362
[8]  
DAYAN P, 1991, ADV NEURAL INFORMATI, V3, P464
[9]  
DAYAN P, 1991, THESIS U EDINBURGH S
[10]  
Moore A. W., 1990, THESIS U CAMBRIDGE C