TD models of reward predictive responses in dopamine neurons

被引:96
作者
Suri, RE [1 ]
机构
[1] Salk Inst, Computat Neurobiol Lab, San Diego, CA 92186 USA
关键词
temporal difference; reinforcement; neuromodulation; sensorimotor; prediction; planning;
D O I
10.1016/S0893-6080(02)00046-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) model, which is a model of reinforcement learning. Computational modeling studies show that such a dopamine-like reward prediction error can serve as a powerful teaching signal for learning with delayed reinforcement, in particular for learning of motor sequences. Several lines of evidence suggest that dopamine is also involved in 'cognitive' processes that are not addressed by standard TD models. I propose the hypothesis that dopamine neuron activity is crucial for planning processes, also referred to as 'goal-directed behavior', which select actions by evaluating predictions about their motivational outcomes. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:523 / 533
页数:11
相关论文
共 63 条
[1]   Goal-directed instrumental action: contingency and incentive learning and their cortical substrates [J].
Balleine, BW ;
Dickinson, A .
NEUROPHARMACOLOGY, 1998, 37 (4-5) :407-419
[2]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[3]  
Bassareo V, 1997, J NEUROSCI, V17, P851
[4]  
Brown J, 1999, J NEUROSCI, V19, P10502
[5]  
DAYAN P, 1994, MACH LEARN, V14, P295
[6]  
DAYAN P, 2000, EXPLAINING AWAY WEIG, P451
[7]  
Dickinson A., 1980, CONT ANIMAL LEARNING
[8]  
DICKINSON A, 1994, INSTRUMENTAL CONDITI, P45
[9]   What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? [J].
Doya, K .
NEURAL NETWORKS, 1999, 12 (7-8) :961-974
[10]  
DUHAMEL JR, 1992, SCIENCE, V255, P90, DOI 10.1126/science.1553535