Dopamine cells respond to predicted events during classical conditioning: Evidence for eligibility traces in the reward-learning network

被引:316
作者
Pan, WX
Schmidt, R
Wickens, JR
Hyland, BI
机构
[1] Univ Otago, Sch Med Sci, Dept Physiol, Dunedin 9001, New Zealand
[2] Univ Otago, Sch Med Sci, Dept Anat & Struct Biol, Dunedin 9001, New Zealand
关键词
ventral tegmental area; temporal difference algorithm; dopaminergic; extracellular recordings; reward; associative learning;
D O I
10.1523/JNEUROSCI.1478-05.2005
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings in conscious rats showed that DA cells retain responses to predicted reward after responses to conditioned cues have developed, at least early in training. This contrasts with previous TD models that predict a gradual stepwise shift in latency with responses to rewards lost before responses develop to the conditioned cue. By exploring the TD parameter space, we demonstrate that the persistent reward responses of DAcells during conditioning are only accurately replicated by a TD model with long-lasting eligibility traces (nonzero values for the parameter lambda) and low learning rate (alpha). These physiological constraints for TD parameters suggest that eligibility traces and low per-trial rates of plastic modification may be essential features of neural circuits for reward learning in the brain. Such properties enable rapid but stable initiation of learning when the number of stimulus-reward pairings is limited, conferring significant adaptive advantages in real-world environments.
引用
收藏
页码:6235 / 6242
页数:8
相关论文
共 47 条
[21]   IMPORTANCE OF UNPREDICTABILITY FOR REWARD RESPONSES IN PRIMATE DOPAMINE NEURONS [J].
MIRENOWICZ, J ;
SCHULTZ, W .
JOURNAL OF NEUROPHYSIOLOGY, 1994, 72 (02) :1024-1027
[22]   A framework for mesencephalic dopamine systems based on predictive Hebbian learning [J].
Montague, PR ;
Dayan, P ;
Sejnowski, TJ .
JOURNAL OF NEUROSCIENCE, 1996, 16 (05) :1936-1947
[23]   Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons [J].
Morris, G ;
Arkadir, D ;
Nevet, A ;
Vaadia, E ;
Bergman, H .
NEURON, 2004, 43 (01) :133-143
[24]   Dopamine neurons can represent context-dependent prediction error [J].
Nakahara, H ;
Itoh, H ;
Kawagoe, R ;
Takikawa, Y ;
Hikosaka, O .
NEURON, 2004, 41 (02) :269-280
[25]   Dissociable roles of ventral and dorsal striatum in instrumental conditioning [J].
O'Doherty, J ;
Dayan, P ;
Schultz, J ;
Deichmann, R ;
Friston, KJ ;
Dolan, RJ .
SCIENCE, 2004, 304 (5669) :452-454
[26]   Temporal difference models and reward-related learning in the human brain [J].
O'Doherty, JP ;
Dayan, P ;
Friston, KJ ;
Critchley, H ;
Dolan, RJ .
NEURON, 2003, 38 (02) :329-337
[27]  
Pavlov IP, 1927, CONDITIONED REFLEXES
[28]  
Paxinos G., 1998, RAT BRAIN STEROTAXIC, VFourth
[29]   Spike-timing-dependent Hebbian plasticity as temporal difference learning [J].
Rao, RPN ;
Sejnowski, TJ .
NEURAL COMPUTATION, 2001, 13 (10) :2221-2237
[30]   DOPAMINE NEURONS OF THE MONKEY MIDBRAIN - CONTINGENCIES OF RESPONSES TO ACTIVE TOUCH DURING SELF-INITIATED ARM MOVEMENTS [J].
ROMO, R ;
SCHULTZ, W .
JOURNAL OF NEUROPHYSIOLOGY, 1990, 63 (03) :592-606