Exploration bonuses and dual control

被引:46
作者
Dayan, P
Sejnowski, TJ
机构
[1] SALK INST, HOWARD HUGHES MED INST, SAN DIEGO, CA 92186 USA
[2] UNIV CALIF SAN DIEGO, DEPT BIOL, LA JOLLA, CA 92093 USA
关键词
reinforcement learning; dynamic programming; exploration bonuses; certainty equivalence; nonstationary environment;
D O I
10.1023/A:1018357105171
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.
引用
收藏
页码:5 / 22
页数:18
相关论文
共 28 条
[1]  
[Anonymous], FKI14991 TU MUNCH
[2]  
[Anonymous], 1965, DYNAMIC PROGRAMMING
[3]   LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].
BARTO, AG ;
BRADTKE, SJ ;
SINGH, SP .
ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138
[4]  
Bertsekas DP., 1978, STOCHASTIC OPTIMAL C
[5]  
Cohn D. A., 1994, ADV NEURAL INFORMATI, P679
[6]  
COZZOLINO JM, 1965, 11 MIT OP RES CTR
[7]   SOME PROPERTIES OF THE DUAL ADAPTIVE STOCHASTIC-CONTROL ALGORITHM [J].
DERSIN, PL ;
ATHANS, M ;
KENDRICK, DA .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1981, 26 (05) :1001-1008
[8]  
Fedorov V., 1972, Theory of optimal experiments
[9]  
Feldbaum A., 1965, Optimal Control Systems
[10]  
Howard RA., 1960, Dynamic Programming and Markov Processes