Fast online Q(λ)

被引：48

作者：

Wierling, M ^{[1
]}

Schmidhuber, J ^{[1
]}

机构：

[1] IDSIA, CH-6900 Lugano, Switzerland

来源：

MACHINE LEARNING | 1998年 / 33卷 / 01期

关键词：

reinforcement learning; Q-learning; TD(lambda); online Q(lambda); lazy learning;

D O I：

10.1023/A:1007562800292

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q(lambda)-learning uses TD(lambda)-methods to accelerate Q-learning. The update complexity of previous online Q(lambda) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.

引用

页码：105 / 115

页数：11

共 21 条

[1] ALBUS JS, 1975, DYNAMIC SYSTEMS MEAS, V97, P220
[2] [Anonymous], 1988, SELF ORG ASS MEMORY
[3] Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
[4] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
BARTO, AG
SUTTON, RS
ANDERSON, CW
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
[5] Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[6] CAIRONI PVC, 1994, IRIDIA9414 U LIBR BR
[7] Cichosz P., 1995, Journal of Artificial Intelligence Research, V2, P287
[8] FRITZKE B, 1994, ADV NEURAL INFORMATI, V6, P255
[9] KOENIG S, 1996, MACH LEARN, V22, P228
[10] LIN LJ, 1993, THESIS CARNEGIEMELLO

← 1 2 3 →