Optimal adaptive policies for sequential allocation problems

被引:96
作者
Burnetas, AN
Katehakis, MN
机构
[1] RUTGERS STATE UNIV,GRAD SCH MANAGEMENT,NEWARK,NJ 07102
[2] RUTGERS STATE UNIV,RUTCOR,NEWARK,NJ 07102
关键词
D O I
10.1006/aama.1996.0007
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Consider the problem of sequential sampling from m statistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters <(<theta))double under bar> is an element of Theta, it is shown that there exists a class C-R of adaptive policies with the following properties: (i) The expected n horizon reward V-n(pi 0)(<(theta)double under bar>) under any policy pi(0) in C-R is equal to n mu*(<(theta)double under bar>) - M(<(theta)double under bar>)log n + o(log n), as n --> infinity, where mu*(<(theta)double under bar>) is the largest population mean and M(<(theta)double under bar>) is a constant. (ii) Policies in C-R are asymptotically optimal within a larger class C-UF of ''uniformly fast convergent'' policies in the sense that lim(n-->x)(n mu*(<(theta)double under bar>) - V-n(pi 0)(<(theta)double under bar>)/(n mu*(<(theta)double under bar>) - V-n(pi)(<(theta)double under bar>) less than or equal to 1, for any pi is an element of C-UF and any <(theta)double under bar> is an element of Theta such that M(<(theta)double under bar>) > 0. Policies in C-R are specified via easily computable indices, defined as-unique solutions to dual problems that arise naturally from the functional form of M(<(theta)double under bar>). In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption. (C) 1996 Academic Press, Inc.
引用
收藏
页码:122 / 142
页数:21
相关论文
共 33 条
[1]  
Acosta-Abreu R. S., 1985, Control and Cybernetics, V14, P313
[2]   ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION SCHEMES FOR CONTROLLED MARKOV-CHAINS - FINITE PARAMETER SPACE [J].
AGRAWAL, R ;
TENEKETZIS, D ;
ANANTHARAM, V .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1989, 34 (12) :1249-1259
[3]  
[Anonymous], 2010, Dynamic programming
[4]  
Berry D. A., 1985, BANDIT PROBLEMS SEQU
[5]  
Beutler F. J., 1989, Stochastics, V26, P81, DOI 10.1080/17442508908833551
[6]  
Burnetas A. N., 1993, PROBAB ENG INFORM SC, V7, P85, DOI DOI 10.1017/S0269964800002801
[7]  
BURNETAS AN, 1994, OPTIMAL ADAPTIVE POL
[8]  
BURNETAS AN, 1989, OPTIMAL SEQUENTIAL A
[9]  
Dembo A., 1993, Large deviations techniques and applications
[10]  
Dynkin E.B., 1979, Grundlehren der Mathematischen Wissenschaften, V235