Partially Observed Markov Decision Process Multiarmed Bandits-Structural Results

被引：24

作者：

Krishnamurthy, Vikram ^{[1
]}

Wahlberg, Bo ^{[2
]}

机构：

[1] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada

[2] KTH, Sch Elect Engn, ACCESS, SE-10044 Stockholm, Sweden

来源：

MATHEMATICS OF OPERATIONS RESEARCH | 2009年 / 34卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

multiarmed bandits; partially observed Markov decision process; monotone policies; likelihood ratio ordering; opportunistic scheduling; stochastic approximation algorithm;

D O I：

10.1287/moor.1080.0371

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

引用

页码：287 / 302

页数：16

共 20 条

[1]

[Anonymous], THESIS BROWN U PROVI

[2]

[Anonymous], 1995, Computational Complexity

[3] Conservation laws, extended polymatroids and multiarmed bandit problems; A polyhedral approach to indexable systems [J].

Bertsimas, D ;

Nino-Mora, J .

MATHEMATICS OF OPERATIONS RESEARCH, 1996, 21 (02) :257-306

[4]

Cassandra A., 1997, P 13 ANN C UNC ART I

[5]

CASSANDRA AR, TONYS POMDP

[6]

Gittins JC, 1989, MULTIARMED BANDIT AL

[7]

KIIJIMA M, 1997, MARKOV PROCESSES STO

[8]

Krishnamurthy V., 2003, 42 IEEE C DEC CONTR

[9] Structured threshold policies for dynamic sensor scheduling - A partially observed Markov decision process approach [J].

Krishnamurthy, Vikram ;

Djonin, Dejan V. .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (10) :4938-4957

[10]

Kumar Panqanamala Ramana, 2015, Stochastic systems: Estimation, identification, and adaptive control

← 1 2 →