A CLASS OF BANDIT PROBLEMS YIELDING MYOPIC OPTIMAL STRATEGIES

被引:10
作者
BANKS, JS
SUNDARAM, RK
机构
关键词
MULTIARMED BANDIT PROBLEMS; DYNAMIC ALLOCATION INDEX; BERNOULLI DISTRIBUTIONS; RANDOM WALK;
D O I
10.2307/3214899
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the class of bandit problems in which each of the n greater-than-or-equal-to 2 independent arms generates rewards according to one of the same two reward distributions, and discounting is geometric over an infinite horizon. We show that the dynamic allocation index of Gittins and Jones (1974) in this context is strictly increasing in the probability that an arm is the better of the two distributions. It follows as an immediate consequence that myopic strategies are the uniquely optimal strategies in this class of bandit problems, regardless of the value of the discount parameter or the shape of the reward distributions. Some implications of this result for bandits with Bernoulli reward distributions are given.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 9 条
[1]  
[Anonymous], 1968, INTRO PROBABILITY TH
[2]  
Berry DA., 1985, BANDIT PROBLEMS SEQU, DOI DOI 10.1007/978-94-015-3711-7
[3]   CONTRIBUTIONS TO 2-ARMED BANDIT PROBLEM [J].
FELDMAN, D .
ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (03) :847-&
[4]   OPTIMALITY OF MYOPIC STOPPING-TIMES FOR GEOMETRIC DISCOUNTING [J].
FRISTEDT, B ;
BERRY, DA .
JOURNAL OF APPLIED PROBABILITY, 1988, 25 (02) :437-443
[5]  
Gittins J. C., 1974, PROGR STAT, P241
[6]  
MCLENNAN A, 1988, UNPUB INCOMPLETE LEA
[7]   SOME RESULTS ON 2-ARMED BANDITS WHEN BOTH PROJECTS VARY [J].
OFLAHERTY, B .
JOURNAL OF APPLIED PROBABILITY, 1989, 26 (03) :655-658
[8]   MANY-ARMED BANDIT PROBLEM [J].
RODMAN, L .
ANNALS OF PROBABILITY, 1978, 6 (03) :491-498
[9]  
ROSS S, 1983, INTRO DYNAMIC PROGRA