AUTOMATIC PROGRAMMING OF BEHAVIOR-BASED ROBOTS USING REINFORCEMENT LEARNING

被引:215
作者
MAHADEVAN, S
CONNELL, J
机构
[1] IBM T.J. Watson Research Center, Yorktown Heights, NY 10598
关键词
33;
D O I
10.1016/0004-3702(92)90058-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance. two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. (2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks.
引用
收藏
页码:311 / 365
页数:55
相关论文
共 33 条
  • [1] ALBUS JS, 1981, BRAINS BEHAVIOR ROBO
  • [2] [Anonymous], 1990, MINIMALIST MOBILE RO
  • [3] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
    BARTO, AG
    SUTTON, RS
    ANDERSON, CW
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
  • [4] A ROBUST LAYERED CONTROL-SYSTEM FOR A MOBILE ROBOT
    BROOKS, RA
    [J]. IEEE JOURNAL OF ROBOTICS AND AUTOMATION, 1986, 2 (01): : 14 - 23
  • [5] BROOKS RA, 1990, MIT1227 AI MEM
  • [6] CHAN K, 1990, 7TH P INT C MACH LEA, P16
  • [7] CHAPMAN D, 1991, P IJCAI 91 SYDNEY
  • [8] CHRISTIANSEN AD, 1990, IEEE C ROBOTICS AUTO, P1224
  • [9] Dejong G., 1986, Machine Learning, V1, P145, DOI 10.1023/A:1022898111663
  • [10] DRESCHER G, 1990, THESIS MIT CAMBRIDGE