Integration of Reinforcement Learning and Optimal Decision-Making Theories of the Basal Ganglia

被引:58
作者
Bogacz, Rafal [1 ]
Larsen, Tobias [1 ,2 ]
机构
[1] Univ Bristol, Dept Comp Sci, Bristol BS8 1UB, Avon, England
[2] Trinity Coll Dublin, Inst Neurosci, Dublin 2, Ireland
基金
英国工程与自然科学研究理事会;
关键词
SUBTHALAMIC NUCLEUS NEURONS; PARIETAL CORTEX; REACTION-TIME; NEURAL BASIS; STRIATONIGRAL INFLUENCE; PERCEPTUAL DECISION; STRIATAL FUNCTIONS; PREFRONTAL CORTEX; DOPAMINE NEURONS; ACTION SELECTION;
D O I
10.1162/NECO_a_00103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article seeks to integrate two sets of theories describing action selection in the basal ganglia: reinforcement learning theories describing learning which actions to select to maximize reward and decision-making theories proposing that the basal ganglia selects actions on the basis of sensory evidence accumulated in the cortex. In particular, we present a model that integrates the actor-critic model of reinforcement learning and a model assuming that the cortico-basal-ganglia circuit implements a statistically optimal decision-making procedure. The values of coricostriatal weights required for optimal decision making in our model differ from those provided by standard reinforcement learning models. Nevertheless, we show that an actor-critic model converges to the weights required for optimal decision making when biologically realistic limits on synaptic weights are introduced. We also describe the model's predictions concerning reaction times and neural responses during learning, and we discuss directions required for further integration of reinforcement learning and optimal decision-making theories.
引用
收藏
页码:817 / 851
页数:35
相关论文
共 76 条
[1]   PARALLEL ORGANIZATION OF FUNCTIONALLY SEGREGATED CIRCUITS LINKING BASAL GANGLIA AND CORTEX [J].
ALEXANDER, GE ;
DELONG, MR ;
STRICK, PL .
ANNUAL REVIEW OF NEUROSCIENCE, 1986, 9 :357-381
[2]   A neurobiological theory of automaticity in perceptual categorization [J].
Ashby, F. Gregory ;
Ennis, John M. ;
Spiering, Brian J. .
PSYCHOLOGICAL REVIEW, 2007, 114 (03) :632-656
[3]   Separate neural substrates for skill learning and performance in the ventral and dorsal striatum [J].
Atallah, Hisham E. ;
Lopez-Paniagua, Dan ;
Rudy, Jerry W. ;
O'Reilly, Randall C. .
NATURE NEUROSCIENCE, 2007, 10 (01) :126-131
[4]   A SEQUENTIAL PROCEDURE FOR MULTIHYPOTHESIS TESTING [J].
BAUM, CW ;
VEERAVALLI, VV .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1994, 40 (06) :1994-2007
[5]   Short-term memory traces for action bias in human reinforcement learning [J].
Bogacz, Rafal ;
McClure, Samuel M. ;
Li, Jian ;
Cohen, Jonathan D. ;
Montague, P. Read .
BRAIN RESEARCH, 2007, 1153 :111-121
[6]   The basal ganglia and cortex implement optimal decision making between alternative actions [J].
Bogacz, Rafal ;
Gurney, Kevin .
NEURAL COMPUTATION, 2007, 19 (02) :442-477
[7]   The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks [J].
Bogacz, Rafal ;
Brown, Eric ;
Moehlis, Jeff ;
Holmes, Philip ;
Cohen, Jonathan D. .
PSYCHOLOGICAL REVIEW, 2006, 113 (04) :700-765
[8]  
Bogacz R, 2009, HANDBOOK OF REWARD AND DECISION MAKING, P375
[9]   RESPONSES OF NEURONS IN MACAQUE MT TO STOCHASTIC MOTION SIGNALS [J].
BRITTEN, KH ;
SHADLEN, MN ;
NEWSOME, WT ;
MOVSHON, JA .
VISUAL NEUROSCIENCE, 1993, 10 (06) :1157-1169
[10]   How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades [J].
Brown, JW ;
Bullock, D ;
Grossberg, S .
NEURAL NETWORKS, 2004, 17 (04) :471-510