Partially Observed Markov Decision Process Multiarmed Bandits-Structural Results

被引:24
作者
Krishnamurthy, Vikram [1 ]
Wahlberg, Bo [2 ]
机构
[1] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada
[2] KTH, Sch Elect Engn, ACCESS, SE-10044 Stockholm, Sweden
基金
加拿大自然科学与工程研究理事会;
关键词
multiarmed bandits; partially observed Markov decision process; monotone policies; likelihood ratio ordering; opportunistic scheduling; stochastic approximation algorithm;
D O I
10.1287/moor.1080.0371
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.
引用
收藏
页码:287 / 302
页数:16
相关论文
共 20 条
[1]  
[Anonymous], THESIS BROWN U PROVI
[2]  
[Anonymous], 1995, Computational Complexity
[3]   Conservation laws, extended polymatroids and multiarmed bandit problems; A polyhedral approach to indexable systems [J].
Bertsimas, D ;
Nino-Mora, J .
MATHEMATICS OF OPERATIONS RESEARCH, 1996, 21 (02) :257-306
[4]  
Cassandra A., 1997, P 13 ANN C UNC ART I
[5]  
CASSANDRA AR, TONYS POMDP
[6]  
Gittins JC, 1989, MULTIARMED BANDIT AL
[7]  
KIIJIMA M, 1997, MARKOV PROCESSES STO
[8]  
Krishnamurthy V., 2003, 42 IEEE C DEC CONTR
[9]   Structured threshold policies for dynamic sensor scheduling - A partially observed Markov decision process approach [J].
Krishnamurthy, Vikram ;
Djonin, Dejan V. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (10) :4938-4957
[10]  
Kumar Panqanamala Ramana, 2015, Stochastic systems: Estimation, identification, and adaptive control