Guiding exploration by pre-existing knowledge without modifying reward

被引:10
作者
Framling, Kary [1 ]
机构
[1] Aalto Univ, FIN-02150 Espoo, Finland
关键词
reinforcement learning; exploration; pre-existing knowledge; short-term memory; action ranking;
D O I
10.1016/j.neunet.2007.02.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning is based on exploration of the environment and receiving reward that indicates which actions taken by the agent are good and which ones are bad. In many applications receiving even the first reward may require long exploration, during which the agent has no information about its progress. This paper presents an approach that makes it possible to use pre-existing knowledge about the task for guiding exploration through the state space. Concept of short- and long-term memory combine guidance by pre-existing knowledge with reinforcement learning methods for value function estimation in order to make learning faster while allowing the agent to converge towards a good policy. (c) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:736 / 747
页数:12
相关论文
共 37 条
[1]  
[Anonymous], 1998, P 15 INT C MACH LEAR
[2]  
Baird L, 1995, MACHINE LEARNING P 1, P30
[3]  
Bakker B, 2002, ADV NEUR IN, V14, P1475
[4]  
Barto A.G., 1990, Learning and Computational Neuroscience
[5]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[6]  
Bertsekas D., 1996, NEURO DYNAMIC PROGRA, V1st
[7]  
Boyan, 1994, ADV NEURAL INFORM PR, P671
[8]  
Bradtke S. J., 1995, Advances in Neural Information Processing Systems 7, P393
[9]   R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning [J].
Brafman, RI ;
Tennenholtz, M .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) :213-231
[10]  
Christopher JohnCornish Hella by Watkins., 1989, Learning from delayed rewards