A NOTE ON THE 2-ARMED BANDIT PROBLEM WITH FINITE MEMORY

被引:25
作者
COVER, TM
机构
[1] Stanford University, Electronic Research Laboratories, Stanford
来源
INFORMATION AND CONTROL | 1968年 / 12卷 / 5-6期
关键词
D O I
10.1016/S0019-9958(68)90382-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Robbins has proposed a finite memory constraint on the two-armed bandit problem in which the coin to be tossed at each stage may depend on the history of the previous tosses only through the outcomes of the last r tosses. Letting the choice of coin depend on the time at which the toss is made, we exhibit a deterministic rule with memory r = 2, the description of which is independent of the coin biases p1 and p2, which achieves, with probability one, a limiting proportion of heads equal to max {{p1, p2}}. Thus this rule is asymptotically uniformly best among the class of time-varying finite memory rules. © 1968 Academic Press Inc.
引用
收藏
页码:371 / &
相关论文
共 7 条
[1]  
COVER TM, 1967, SEP IEEE INT S INF T
[2]   ON A PROBLEM OF ROBBINS [J].
ISBELL, JR .
ANNALS OF MATHEMATICAL STATISTICS, 1959, 30 (02) :606-610
[3]   SOME ASPECTS OF THE SEQUENTIAL DESIGN OF EXPERIMENTS [J].
ROBBINS, H .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1952, 58 (05) :527-535
[5]  
SAMUELS SM, 1966, 71 PURD U TECH REPT
[6]   THE ROBBINS-ISBELL 2-ARMED-BANDIT PROBLEM WITH FINITE MEMORY [J].
SMITH, CV ;
PYKE, R .
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (05) :1375-1386
[7]  
Varshavskii V. I., 1963, AUTOMAT REM CONTR+, V24, P327