MACHINE LEARNING AND NONPARAMETRIC BANDIT THEORY

被引:22
作者
LAI, TL [1 ]
YAKOWITZ, S [1 ]
机构
[1] UNIV ARIZONA,DEPT SYST & IND ENGN,TUCSON,AZ 85721
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1109/9.400491
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In its most basic form, bandit theory is concerned with the design problem of sequentially choosing members from a given collection of random variables so that the regret, i,e,, R(n) = Sigma(j)(mu* - mu(j))ET(n)(j), grows as slowly as possible with increasing n, Here mu(j) is the expected value of the bandit arm (i.e,, random variable) indexed by j, T-n, (j) is the number of times arm j has been selected in the first n decision stages, and mu* = sup(j) mu(j). The present paper contributes to the theory by considering the situation in which observations are dependent, To begin with, the dependency is presumed to depend only on past observations of the same arm, but later, we allow that it may be with respect to the entire past and that the set of arms is infinite, This brings queues and, more generally, controlled Markov processes into our purview, Thus our ''black-box'' methodology is suitable for the case when the only observables are cost values and, in particular, the probability structure and loss function are unknown to the designer. The conclusion of the analysis is that under lenient conditions, using algorithms prescribed herein, risk growth is commensurate with that in the simplest i,i,d, cases. Our methods represent an alternative to recent stochastic-approximation/perturbation-analysis ideas for tuning queues.
引用
收藏
页码:1199 / 1209
页数:11
相关论文
共 23 条
[11]   ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION RULES [J].
LAI, TL ;
ROBBINS, H .
ADVANCES IN APPLIED MATHEMATICS, 1985, 6 (01) :4-22
[12]   STOCHASTIC OPTIMIZATION BY SIMULATION - CONVERGENCE PROOFS FOR THE GI/G/1 QUEUE IN STEADY-STATE [J].
LECUYER, P ;
GLYNN, PW .
MANAGEMENT SCIENCE, 1994, 40 (11) :1562-1578
[13]  
Mallows C., 1964, J MATH ANAL APPL, V8, P90
[14]   LARGE DEVIATION PRINCIPLES FOR STATIONARY-PROCESSES [J].
OREY, S ;
PELIKAN, S .
ANNALS OF PROBABILITY, 1988, 16 (04) :1481-1495
[15]  
PFLUG GC, 1990, MATH OPER RES, V12, P381
[16]  
ROBBINS H, 1952, B AM MATH SOC, V55, P527
[17]   EXPONENTIAL CONVERGENCE UNDER MIXING [J].
SCHONMANN, RH .
PROBABILITY THEORY AND RELATED FIELDS, 1989, 81 (02) :235-238
[18]  
Stout W, 1974, ALMOST SURE CONVERGE
[19]   SINGLE RUN OPTIMIZATION OF DISCRETE EVENT SIMULATIONS - AN EMPIRICAL-STUDY USING THE M/M/1 QUEUE [J].
SURI, R ;
LEUNG, YT .
IIE TRANSACTIONS, 1989, 21 (01) :35-49
[20]   ESTIMATION OF THE DERIVATIVE OF A STATIONARY MEASURE WITH RESPECT TO A CONTROL PARAMETER [J].
VAZQUEZABAD, FJ ;
KUSHNER, HJ .
JOURNAL OF APPLIED PROBABILITY, 1992, 29 (02) :343-352