AUTOMATIC PROGRAMMING OF BEHAVIOR-BASED ROBOTS USING REINFORCEMENT LEARNING

被引：215

作者：

MAHADEVAN, S

CONNELL, J

机构：

[1] IBM T.J. Watson Research Center, Yorktown Heights, NY 10598

来源：

ARTIFICIAL INTELLIGENCE | 1992年 / 55卷 / 2-3期

关键词：

33;

D O I：

10.1016/0004-3702(92)90058-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance. two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. (1) The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. (2) Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks.

引用

页码：311 / 365

页数：55

共 33 条

[1] ALBUS JS, 1981, BRAINS BEHAVIOR ROBO
[2] [Anonymous], 1990, MINIMALIST MOBILE RO
[3] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS
BARTO, AG
SUTTON, RS
ANDERSON, CW
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05): : 834 - 846
[4] A ROBUST LAYERED CONTROL-SYSTEM FOR A MOBILE ROBOT
BROOKS, RA
[J]. IEEE JOURNAL OF ROBOTICS AND AUTOMATION, 1986, 2 (01): : 14 - 23
[5] BROOKS RA, 1990, MIT1227 AI MEM
[6] CHAN K, 1990, 7TH P INT C MACH LEA, P16
[7] CHAPMAN D, 1991, P IJCAI 91 SYDNEY
[8] CHRISTIANSEN AD, 1990, IEEE C ROBOTICS AUTO, P1224
[9] Dejong G., 1986, Machine Learning, V1, P145, DOI 10.1023/A:1022898111663
[10] DRESCHER G, 1990, THESIS MIT CAMBRIDGE

← 1 2 3 4 →