TEMPORAL DIFFERENCE-METHODS AND MARKOV-MODELS

被引:25
作者
BARNARD, E
机构
[1] Department of Electronics, Computer Engineering, Pretoria, 0002, University of Pretoria
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS | 1993年 / 23卷 / 02期
关键词
D O I
10.1109/21.229449
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The relation between temporal-difference training methods and Markov models, which was first noticed by Sutton, is explored. This relation is derived from a new perspective, and in this way the particular association between conventional temporal-difference methods and first-order Markov models is explained. We then derive a generalization of temporal-difference methods that is suitable for Markov models of higher order. Finally, several issues related to the performance of mismatched temporal-difference methods (i.e., the performance when the temporal-difference method is not specifically designed to match the order of the Markov model) are investigated numerically.
引用
收藏
页码:357 / 365
页数:9
相关论文
共 14 条
[1]  
BILLINGSLEY P, 1960, ANN MATH STAT, V32, P12
[2]  
DAYAN P, 1992, IN PRESS MACHINE LEA
[3]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[4]  
Holland J., 1986, MACHINE LEARNING ART
[5]  
Pineda F. J., 1988, Journal of Complexity, V4, P216, DOI 10.1016/0885-064X(88)90021-0
[6]  
SAMUEL AL, 1959, IBM J RES DEV, V3, P211, DOI 10.1147/rd.441.0206
[7]  
Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479
[8]  
SUTTON RS, 1981, PSYCH REV, V88
[9]  
SUTTON RS, 1990, 6TH P INT C MACH LEA, P226
[10]  
TESAURO G, 1992, IN PRESS MACHINE LEA