Optimal learning and experimentation in bandit problems

被引:62
作者
Brezzi, M
Lai, TL
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Minist Tesoro, Dipartimento Polit Sviluppo & Coes, I-00187 Rome, Italy
关键词
optimal stopping; corrected binomial algorithm; multi-armed bandits; switching costs; incomplete learning;
D O I
10.1016/S0165-1889(01)00028-8
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper studies how and how much active experimentation is used in discounted or finite-horizon optimization problems with an agent who chooses actions sequentially from a finite set of actions, with rewards depending on unknown parameters associated with the actions. Closed-form approximations are developed for the optimal rules in these 'multi-armed bandit' problems. Some refinements and modifications of the basic structure of these approximations also provide a nearly optimal solution to the long-standing problem of incorporating switching costs into multi-armed bandits. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:87 / 108
页数:22
相关论文
共 20 条
  • [1] ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION RULES FOR THE MULTIARMED BANDIT PROBLEM WITH SWITCHING COST
    AGRAWAL, R
    HEGDE, MV
    TENEKETZIS, D
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1988, 33 (10) : 899 - 905
  • [2] DENUMERABLE-ARMED BANDITS
    BANKS, JS
    SUNDARAM, RK
    [J]. ECONOMETRICA, 1992, 60 (05) : 1071 - 1096
  • [3] BERNOULLI 2-ARMED BANDIT
    BERRY, DA
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (03): : 871 - &
  • [4] Incomplete learning from endogenous data in dynamic allocation
    Brezzi, M
    Lai, TL
    [J]. ECONOMETRICA, 2000, 68 (06) : 1511 - 1516
  • [5] CHANG F., 1987, ADV APPL PROBAB, V19, P829
  • [6] NUMERICAL-SOLUTIONS FOR BAYES SEQUENTIAL DECISION-PROBLEMS
    CHERNOFF, H
    PETKAU, AJ
    [J]. SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1986, 7 (01): : 46 - 59
  • [7] FABIUS J, 1970, ANN MATH STAT, V41, P1906, DOI 10.1214/aoms/1177696692
  • [8] CONTRIBUTIONS TO 2-ARMED BANDIT PROBLEM
    FELDMAN, D
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (03): : 847 - &
  • [9] Gittins J., 1974, PROGR STAT, P241
  • [10] GITTINS JC, 1979, J ROY STAT SOC B MET, V41, P148