Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics

被引:58
作者
Tanaka, Saori C.
Samejima, Kazuyuki
Okada, Go
Ueda, Kazutaka
Okamoto, Yasumasa
Yamawaki, Shigeto
Doya, Kenji
机构
[1] ATR Computat Neurosci labs, Dept Computat Neurobiol, Kyoto 6190288, Japan
[2] Nara Inst Sci & Technol, Dept Bioinformat & Genom, Nara 63001, Japan
[3] Hiroshima Univ, Dept Psychiat & Neurosci, Hiroshima 730, Japan
[4] Okinawa Inst Sci & Technol, Initial Res Project, Okinawa, Japan
基金
日本科学技术振兴机构;
关键词
reinforcement learning model; Markov decision problem; fMRI; human; Bayesian estimation; PRIMATE ORBITOFRONTAL CORTEX; TEMPORAL DIFFERENCE MODELS; NEURONAL-ACTIVITY; PREFRONTAL CORTEX; PARIETAL CORTEX; BASAL GANGLIA; DECISION-MAKING; PREMOTOR CORTEX; HUMAN STRIATUM; HUMANS;
D O I
10.1016/j.neunet.2006.05.039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In learning goal-directed behaviors, an agent has to consider not only the reward given at each state but also the consequences of dynamic state transitions associated with action selection. To understand brain mechanisms for action learning under predictable and unpredictable environmental dynamics, we measured brain activities by functional magnetic resonance imaging (fMRI) during a Markov decision task with predictable and unpredictable state transitions. Whereas the striatum and orbitofrontal cortex (OFC) were significantly activated both under predictable and unpredictable state transition rules, the dorsolateral prefrontal cortex (DLPFC) was more strongly activated under predictable than under unpredictable state transition rules. We then modelled subjects' choice behaviours using a reinforcement learning model and a Bayesian estimation framework and found that the subjects took larger temporal discount factors under predictable state transition rules. Model-based analysis of fMRI data revealed different engagement of striatum in reward prediction under different state transition dynamics. The ventral striatum was involved in reward prediction under both unpredictable and predictable state transition rules, although the dorsal striatum was dominantly involved in reward prediction under predictable rules. These results suggest different learning systems in the cortico-striatum loops depending on the dynamics of the environment: the OFC-ventral striatum loop is involved in action learning based on the present state, while the DLPFC-dorsal striatum loop is involved in action learning based on predictable future states. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1233 / 1241
页数:9
相关论文
共 38 条
[1]   PARALLEL ORGANIZATION OF FUNCTIONALLY SEGREGATED CIRCUITS LINKING BASAL GANGLIA AND CORTEX [J].
ALEXANDER, GE ;
DELONG, MR ;
STRICK, PL .
ANNUAL REVIEW OF NEUROSCIENCE, 1986, 9 :357-381
[2]   Prefrontal cortex and decision making in a mixed-strategy game [J].
Barraclough, DJ ;
Conroy, ML ;
Lee, D .
NATURE NEUROSCIENCE, 2004, 7 (04) :404-410
[3]   Emotion, decision making and the orbitofrontal cortex [J].
Bechara, A ;
Damasio, H ;
Damasio, AR .
CEREBRAL CORTEX, 2000, 10 (03) :295-307
[4]   Predictability modulates human brain response to reward [J].
Berns, GS ;
McClure, SM ;
Pagnoni, G ;
Montague, PR .
JOURNAL OF NEUROSCIENCE, 2001, 21 (08) :2793-2798
[5]   Complementary roles of basal ganglia and cerebellum in learning and motor control [J].
Doya, K .
CURRENT OPINION IN NEUROBIOLOGY, 2000, 10 (06) :732-739
[6]  
Elliott R, 2003, J NEUROSCI, V23, P303
[7]  
GNADT JW, 1988, EXP BRAIN RES, V70, P216
[8]  
Houk J. C., 1995, Models of information processing in the basal ganglia, P249
[9]   Expectation of reward modulates cognitive signals in the basal ganglia [J].
Kawagoe, R ;
Takikawa, Y ;
Hikosaka, O .
NATURE NEUROSCIENCE, 1998, 1 (05) :411-416
[10]   Diffusion tensor fiber tracking shows distinct Corticostriatal circuits in humans [J].
Lehéricy, S ;
Ducros, M ;
Van de Moortele, PF ;
Francois, C ;
Thivard, L ;
Poupon, C ;
Swindale, N ;
Ugurbil, K ;
Kim, DS .
ANNALS OF NEUROLOGY, 2004, 55 (04) :522-529