Fast online Q(λ)

被引:48
作者
Wierling, M [1 ]
Schmidhuber, J [1 ]
机构
[1] IDSIA, CH-6900 Lugano, Switzerland
关键词
reinforcement learning; Q-learning; TD(lambda); online Q(lambda); lazy learning;
D O I
10.1023/A:1007562800292
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q(lambda)-learning uses TD(lambda)-methods to accelerate Q-learning. The update complexity of previous online Q(lambda) implementations based on lookup tables is bounded by the size of the state/action space. Our faster algorithm's update complexity is bounded by the number of actions. The method is based on the observation that Q-value updates may be postponed until they are needed.
引用
收藏
页码:105 / 115
页数:11
相关论文
共 21 条
  • [1] ALBUS JS, 1975, DYNAMIC SYSTEMS MEAS, V97, P220
  • [2] [Anonymous], 1988, SELF ORG ASS MEMORY
  • [3] Atkeson CG, 1997, ARTIF INTELL REV, V11, P11, DOI 10.1023/A:1006559212014
  • [4] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
    BARTO, AG
    SUTTON, RS
    ANDERSON, CW
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
  • [5] Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
  • [6] CAIRONI PVC, 1994, IRIDIA9414 U LIBR BR
  • [7] Cichosz P., 1995, Journal of Artificial Intelligence Research, V2, P287
  • [8] FRITZKE B, 1994, ADV NEURAL INFORMATI, V6, P255
  • [9] KOENIG S, 1996, MACH LEARN, V22, P228
  • [10] LIN LJ, 1993, THESIS CARNEGIEMELLO