Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development

被引:80
作者
Asada, M [1 ]
Uchibe, E [1 ]
Hosoda, K [1 ]
机构
[1] Osaka Univ, Dept Adapt Machine Syst, Grad Sch Engn, Suita, Osaka 5650871, Japan
关键词
multi-agent learning; vision-based learning; reinforcement learning; cooperative behavior; physical embodiment;
D O I
10.1016/S0004-3702(99)00026-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we first discuss the meaning of physical embodiment and the complexity of the environment in the context of multi-agent learning. We then propose a vision-based reinforcement learning method that acquires cooperative behaviors in a dynamic environment. We use the robot soccer game initiated by RoboCup (Kitano et al., 1997) to illustrate the effectiveness of our method. Each agent works with other team members to achieve a common goal against opponents. Our method estimates the relationships between a learner's behaviors and those of other agents in the environment through interactions (observations and actions) using a technique from system identification. In order to identify the model of each agent, Akaike's Information Criterion is applied to the results of Canonical Variate Analysis to clarify the relationship between the observed data in terms of actions and future observations. Next, reinforcement learning based on the estimated state vectors is performed to obtain the optimal behavior policy. The proposed method is applied to a soccer playing situation. The method successfully models a rolling ball and other moving agents and acquires the learner's behaviors. Computer simulations and real experiments are shown and a discussion is given. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:275 / 292
页数:18
相关论文
共 27 条
  • [1] COMPUTATIONAL RESEARCH ON INTERACTION AND AGENCY
    AGRE, PE
    [J]. ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) : 1 - 52
  • [2] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [3] WHAT I HAVE LEARNED - REPLY
    ALOIMONOS, Y
    [J]. CVGIP-IMAGE UNDERSTANDING, 1994, 60 (01): : 74 - 85
  • [4] ALOIMONOS Y, 1993, ACTIVE PERCEPTION, P1
  • [5] Asada M, 1996, IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, P1502, DOI 10.1109/IROS.1996.569012
  • [6] Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    Asada, M
    Noda, S
    Tawaratsumida, S
    Hosoda, K
    [J]. MACHINE LEARNING, 1996, 23 (2-3) : 279 - 303
  • [7] ASADA M, 1996, P 1996 IROS WORKSH R, P19
  • [8] Deictic codes for the embodiment of cognition
    Ballard, DH
    Hayhoe, MM
    Pook, PK
    Rao, RPN
    [J]. BEHAVIORAL AND BRAIN SCIENCES, 1997, 20 (04) : 723 - +
  • [9] Connell J.H., 1993, ROBOT LEARNING
  • [10] REPRESENTATION WITHOUT RECONSTRUCTION - REPLY
    EDELMAN, S
    [J]. CVGIP-IMAGE UNDERSTANDING, 1994, 60 (01): : 92 - 94