IMPROVING GENERALIZATION FOR TEMPORAL DIFFERENCE LEARNING - THE SUCCESSOR REPRESENTATION

被引：387

作者：

DAYAN, P

机构：

来源：

NEURAL COMPUTATION | 1993年 / 5卷 / 04期

关键词：

D O I：

10.1162/neco.1993.5.4.613

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalization between states is determined by how similar their successors are, and representations should follow suit. This paper shows how TD machinery can be used to learn such representations, and illustrates, using a navigation task, the appropriately distributed nature of the result.

引用

页码：613 / 624

页数：12

共 19 条

[1]

Albus J. S., 1975, Transactions of the ASME. Series G, Journal of Dynamic Systems, Measurement and Control, V97, P220, DOI 10.1115/1.3426922

[2]

Anderson C. W., 1986, THESIS U MASSACHUSET

[3]

[Anonymous], 1989, LEARNING DELAYED REW

[4]

BARTO AG, 1991, TR9157 U AMH DEP COM

[5]

BARTO AG, 1989, 8995 U MASS COMP INF

[6]

CHAPMAN D, 1991, 1991 P INT JOINT C A, P726

[7] THE CONVERGENCE OF TD(LAMBDA) FOR GENERAL LAMBDA [J].

DAYAN, P .

MACHINE LEARNING, 1992, 8 (3-4) :341-362

[8]

DAYAN P, 1991, ADV NEURAL INFORMATI, V3, P464

[9]

DAYAN P, 1991, THESIS U EDINBURGH S

[10]

Moore A. W., 1990, THESIS U CAMBRIDGE C

← 1 2 →