Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning

被引:132
作者
Morimoto, J
Doya, K
机构
[1] JST, ERATO, Kawato Dynam Brain Project, Kyoto 6190288, Japan
[2] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300101, Japan
[3] JST, CREST, ATR Int, Kyoto 6190288, Japan
关键词
reinforcement learning; hierarchical; real robot; stand-up; motor control;
D O I
10.1016/S0921-8890(01)00113-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a hierarchical reinforcement learning architecture that realizes practical learning speed in real hardware control tasks. In order to enable learning in a practical number of trials, we introduce a low-dimensional representation of the state of the robot for higher-level planning. The upper level learns a discrete sequence of sub-goals in a low-dimensional state space for achieving the main goal of the task. The lower-level modules learn local trajectories in the original high-dimensional state space to achieve the sub-goal specified by the upper level. We applied the hierarchical architecture to a three-link, two-joint robot for the task of learning to stand up by trial and error. The upper-level learning was implemented by Q-learning, while the lower-level learning was implemented by a continuous actor-critic method. The robot successfully learned to stand up within 750 trials in simulation and then in an additional 170 trials using real hardware. The effects of the setting of the search steps in the upper level and the use of a supplementary reward for achieving sub-goals are also tested in simulation. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:37 / 51
页数:15
相关论文
共 28 条
  • [1] RoboCup: Today and tomorrow - What we have learned
    Asada, M
    Kitano, H
    Noda, I
    Veloso, M
    [J]. ARTIFICIAL INTELLIGENCE, 1999, 110 (02) : 193 - 214
  • [2] Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development
    Asada, M
    Uchibe, E
    Hosoda, K
    [J]. ARTIFICIAL INTELLIGENCE, 1999, 110 (02) : 275 - 292
  • [3] Dayan P., 1992, Advances in Neural Information Processing Systems, V5, P271, DOI DOI 10.5555/2987061.2987095
  • [4] DIGNEY BL, 1998, P 15 INT C SIM AD BE, P321
  • [5] Doya K, 1997, ADV NEUR IN, V9, P1012
  • [6] Reinforcement learning in continuous time and space
    Doya, K
    [J]. NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
  • [7] INABA M, 1996, P IEEE RSJ INT C INT, V1, P29
  • [8] HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM
    JORDAN, MI
    JACOBS, RA
    [J]. NEURAL COMPUTATION, 1994, 6 (02) : 181 - 214
  • [9] Kanehiro F, 1996, IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, P23, DOI 10.1109/IROS.1996.570617
  • [10] Kimura H, 1999, MACHINE LEARNING, PROCEEDINGS, P210