Learning of sequential movements by neural network model with dopamine-like reinforcement signal

被引:139
作者
Suri, RE
Schultz, W [1 ]
机构
[1] Univ Fribourg, Inst Physiol, CH-1700 Fribourg, Switzerland
[2] Univ So Calif, Brain Project, Los Angeles, CA 90089 USA
关键词
basal ganglia; teaching signal; temporal difference; synaptic plasticity; eligibility;
D O I
10.1007/s002210050467
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to reward-predicting stimuli in a similar manner. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a reward-prediction error signal strongly resembling dopamine responses for learning movement sequences. A different stimulus was presented in each step of the sequence and required a different movement reaction, and reward occurred at the end of the correctly performed sequence. The dopamine-like predictive reinforcement signal efficiently allowed the model to learn long sequences. By contrast, learning with an unconditional reinforcement signal required synaptic eligibility traces of longer and biologically less-plausible durations for obtaining satisfactory performance. Thus, dopamine-like neuronal signals constitute excellent teaching signals for learning sequential behavior.
引用
收藏
页码:350 / 354
页数:5
相关论文
共 31 条
[11]   VALUE-DEPENDENT SELECTION IN THE BRAIN - SIMULATION IN A SYNTHETIC NEURAL MODEL [J].
FRISTON, KJ ;
TONONI, G ;
REEKE, GN ;
SPORNS, O ;
EDELMAN, GM .
NEUROSCIENCE, 1994, 59 (02) :229-243
[12]  
GROSSBERG S, 1987, PSYCHOBIOLOGY, V15, P195
[13]   FUNCTIONAL-PROPERTIES OF MONKEY CAUDATE NEURONS .3. ACTIVITIES RELATED TO EXPECTATION OF TARGET AND REWARD [J].
HIKOSAKA, O ;
SAKAMOTO, M ;
USUI, S .
JOURNAL OF NEUROPHYSIOLOGY, 1989, 61 (04) :814-832
[14]  
Houk J., 1995, Models ofInformation Processing in the Basal Ganglia, P249
[15]   ACTIVITY IN THE CAUDATE-NUCLEUS OF MONKEY DURING SPATIAL SEQUENCING [J].
KERMADI, I ;
JOSEPH, JP .
JOURNAL OF NEUROPHYSIOLOGY, 1995, 74 (03) :911-933
[16]   IMPORTANCE OF UNPREDICTABILITY FOR REWARD RESPONSES IN PRIMATE DOPAMINE NEURONS [J].
MIRENOWICZ, J ;
SCHULTZ, W .
JOURNAL OF NEUROPHYSIOLOGY, 1994, 72 (02) :1024-1027
[17]   Differential roles of monkey striatum in learning of sequential hand movement [J].
Miyachi, S ;
Hikosaka, O ;
Miyashita, K ;
Karadi, Z ;
Rand, MK .
EXPERIMENTAL BRAIN RESEARCH, 1997, 115 (01) :1-5
[18]  
MONTAGUE PR, 1994, LEARN MEMORY, V1, P1
[19]   A framework for mesencephalic dopamine systems based on predictive Hebbian learning [J].
Montague, PR ;
Dayan, P ;
Sejnowski, TJ .
JOURNAL OF NEUROSCIENCE, 1996, 16 (05) :1936-1947
[20]  
MONTAGUE PR, 1993, NEURAL INFORMATION P, V5, P969