Reinforcement learning with via-point representation

被引:24
作者
Miyamoto, H
Morimoto, J
Doya, K
Kawato, M
机构
[1] Kyushu Inst Technol, Grad Sch Life Sci & Syst Engn, Wakamatsu Ku, Kitakyushu, Fukuoka 8080196, Japan
[2] ATR Computat Neurosci Labs, Kyoto, Japan
[3] Japan Sci & Technol Corp, Kawato Dynam Brain Project, Kyoto, Japan
关键词
hierarchical reinforcement learning; via-point; motor control; cart-pole; swing up; robotics;
D O I
10.1016/j.neunet.2003.11.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new learning framework for motor control. This framework consists of two components: reinforcement learning and via-point representation. In the field of motor control, conventional reinforcement learning has been used to acquire control sequences such as cart-pole or stand-up robot control. Recently, researchers have become interested in hierarchical architecture, such as multiple levels, and multiple temporal and spatial scales. Our new framework contains two levels of hierarchical architecture. The higher level is implemented using via-point representation, which corresponds to macro-actions or multiple time scales. The lower level is implemented using a trajectory generator that produces primitive actions. Our framework can modify the ongoing movement by means of temporally localized via-points and trajectory generation. Successful results are obtained in computer simulation of the cart-pole swing up task. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:299 / 305
页数:7
相关论文
共 22 条
[1]  
[Anonymous], 1989, S BIOL PHYS ENG
[2]  
[Anonymous], 1997, INT C MACH LEARN ICM
[3]  
Atkeson CG, 1997, IEEE INT CONF ROBOT, P1706, DOI 10.1109/ROBOT.1997.614389
[4]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[5]  
Doya K, 1996, ADV NEUR IN, V8, P1073
[6]  
Doya K, 1997, ADV NEUR IN, V9, P1012
[7]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245
[8]   THE COORDINATION OF ARM MOVEMENTS - AN EXPERIMENTALLY CONFIRMED MATHEMATICAL-MODEL [J].
FLASH, T ;
HOGAN, N .
JOURNAL OF NEUROSCIENCE, 1985, 5 (07) :1688-1703
[9]   A STOCHASTIC REINFORCEMENT LEARNING ALGORITHM FOR LEARNING REAL-VALUED FUNCTIONS [J].
GULLAPALLI, V .
NEURAL NETWORKS, 1990, 3 (06) :671-692
[10]   Signal-dependent noise determines motor planning [J].
Harris, CM ;
Wolpert, DM .
NATURE, 1998, 394 (6695) :780-784