Learning of sequential movements by neural network model with dopamine-like reinforcement signal

被引：139

作者：

Suri, RE

Schultz, W ^{[1
]}

机构：

[1] Univ Fribourg, Inst Physiol, CH-1700 Fribourg, Switzerland

[2] Univ So Calif, Brain Project, Los Angeles, CA 90089 USA

来源：

EXPERIMENTAL BRAIN RESEARCH | 1998年 / 121卷 / 03期

关键词：

basal ganglia; teaching signal; temporal difference; synaptic plasticity; eligibility;

D O I：

10.1007/s002210050467

中图分类号：

Q189 [神经科学];

学科分类号：

071006 ;

摘要：

Dopamine neurons appear to code an error in the prediction of reward. They are activated by unpredicted rewards, are not influenced by predicted rewards, and are depressed when a predicted reward is omitted. After conditioning, they respond to reward-predicting stimuli in a similar manner. With these characteristics, the dopamine response strongly resembles the predictive reinforcement teaching signal of neural network models implementing the temporal difference learning algorithm. This study explored a neural network model that used a reward-prediction error signal strongly resembling dopamine responses for learning movement sequences. A different stimulus was presented in each step of the sequence and required a different movement reaction, and reward occurred at the end of the correctly performed sequence. The dopamine-like predictive reinforcement signal efficiently allowed the model to learn long sequences. By contrast, learning with an unconditional reinforcement signal required synaptic eligibility traces of longer and biologically less-plausible durations for obtaining satisfactory performance. Thus, dopamine-like neuronal signals constitute excellent teaching signals for learning sequential behavior.

引用

页码：350 / 354

页数：5

共 31 条

[11] VALUE-DEPENDENT SELECTION IN THE BRAIN - SIMULATION IN A SYNTHETIC NEURAL MODEL [J].