Dopamine, uncertainty and TD learning

被引:95
作者
Niv, Yael [1 ,2 ]
Duff, Michael O. [2 ]
Dayan, Peter [2 ]
机构
[1] Hebrew Univ Jerusalem, Interdisciplinary Ctr Neural Computat, Jerusalem, Israel
[2] UCL, Gatsby Computat Neurosci Unit, London, England
关键词
Prediction Error; Learning Rate; Negative Error; Trace Conditioning; Future Reward;
D O I
10.1186/1744-9081-1-6
中图分类号
B84 [心理学]; C [社会科学总论]; Q98 [人类学];
学科分类号
03 ; 0303 ; 030303 ; 04 ; 0402 ;
摘要
Substantial evidence suggests that the phasic activities of dopaminergic neurons in the primate midbrain represent a temporal difference (TD) error in predictions of future reward, with increases above and decreases below baseline consequent on positive and negative prediction errors, respectively. However, dopamine cells have very low baseline activity, which implies that the representation of these two sorts of error is asymmetric. We explore the implications of this seemingly innocuous asymmetry for the interpretation of dopaminergic firing patterns in experiments with probabilistic rewards which bring about persistent prediction errors. In particular, we show that when averaging the non-stationary prediction errors across trials, a ramping in the activity of the dopamine neurons should be apparent, whose magnitude is dependent on the learning rate. This exact phenomenon was observed in a recent experiment, though being interpreted there in antipodal terms as a within-trial encoding of uncertainty.
引用
收藏
页数:9
相关论文
共 26 条
  • [1] [Anonymous], 1995, MODELS INFORM PROCES
  • [2] Barto A. G., 1990, LEARNING COMPUTATION, P539
  • [3] Bayer H, 2004, THESIS
  • [4] Daw N, RECENT BREA IN PRESS
  • [5] Opponent interactions between serotonin and dopamine
    Daw, ND
    Kakade, S
    Dayan, P
    [J]. NEURAL NETWORKS, 2002, 15 (4-6) : 603 - 616
  • [6] Dayan P, 2002, ADV NEUR IN, V14, P189
  • [7] Learning and selective attention
    Dayan, Peter
    Kakade, Sham
    Montague, P. Read
    [J]. NATURE NEUROSCIENCE, 2000, 3 (11) : 1218 - 1223
  • [8] Discrete coding of reward probability and uncertainty by dopamine neurons
    Fiorillo, CD
    Tobler, PN
    Schultz, W
    [J]. SCIENCE, 2003, 299 (5614) : 1898 - 1902
  • [9] Time, rate, and conditioning
    Gallistel, CR
    Gibbon, J
    [J]. PSYCHOLOGICAL REVIEW, 2000, 107 (02) : 289 - 344
  • [10] Dopamine neurons report an error in the temporal prediction of reward during learning
    Hollerman, JR
    Schultz, W
    [J]. NATURE NEUROSCIENCE, 1998, 1 (04) : 304 - 309