A STOCHASTIC REINFORCEMENT LEARNING ALGORITHM FOR LEARNING REAL-VALUED FUNCTIONS

被引：169

作者：

GULLAPALLI, V

机构：

[1] Department of Computer and Information Science, University of Massachusetts

来源：

NEURAL NETWORKS | 1990年 / 3卷 / 06期

关键词：

Associative reinforcement learning; Learning algorithm; Neural networks; Neurocontrol; Real-valued functions; Robotics; Shaping; Stochastic automata;

D O I：

10.1016/0893-6080(90)90056-Q

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most of the research in reinforcement learning has been on problems with discrete action spaces. However, many control problems require the application of continuous control signals. In this paper, we present a stochastic reinforcement learning algorithm for learning functions with continuous outputs using a connectionist network. We define stochastic units that compute their real-valued outputs as a function of random activations generated using the normal distribution. Learning takes place by using our algorithm to adjust the two parameters of the normal distribution so as to increase the probability of producing the optimal real value for each input pattern. The performance of the algorithm is studied by using it to learn tasks of varying levels of difficulty. Further, as an example of a potential application, we present a network incorporating these stochastic real-valued units that learns to perform an underconstrained positioning task using a simulated 3 degree-of-freedom robot arm. © 1990.

引用

页码：671 / 692

页数：22

共 31 条

[1]

ACKLEY DH, 1985, COGNITIVE SCI, V9, P147

[2]

ALBUS JS, 1975, BRAINS BEHAVIOR ROBO

[3]

ALSPECTOR J, 1987, P C NEURAL INFORMATI

[4]

Anderson C. W., 1986, THESIS U MASSACHUSET

[5]

[Anonymous], 1989, LEARNING AUTOMATA IN

[6]

[Anonymous], 1955, STOCHASTIC MODELS LE

[7]

ARBIB MA, 1981, HDB PHYSL NERVOUS SY, V2

[8]

Atkinson R. C., 1965, INTRO MATH LEARNING

[9] ASSOCIATIVE SEARCH NETWORK - A REINFORCEMENT LEARNING ASSOCIATIVE MEMORY [J].

BARTO, AG ;

SUTTON, RS ;

BROUWER, PS .

BIOLOGICAL CYBERNETICS, 1981, 40 (03) :201-211

[10]

BARTO AG, 1985, HUM NEUROBIOL, V4, P229

← 1 2 3 4 →