COMPUTATIONALLY FEASIBLE BOUNDS FOR PARTIALLY OBSERVED MARKOV DECISION-PROCESSES

被引:159
作者
LOVEJOY, WS
机构
关键词
DYNAMIC PROGRAMMING; PARTIALLY OBSERVED MARKOV DECISION PROCESSES; MARKOV; BAYESIAN PROGRAMMING AND INFINITE STATE MARKOV MODELS;
D O I
10.1287/opre.39.1.162
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
A partially observed Markov decision process (POMDP) is a sequential decision problem where information concerning parameters of interest is incomplete, and possible actions include sampling, surveying, or otherwise collecting additional information. Such problems can theoretically be solved as dynamic programs, but the relevant state space is infinite, which inhibits algorithmic solution. This paper explains how to approximate the state space by a finite grid of points, and use that grid to construct upper and lower value function bounds, generate approximate nonstationary and stationary policies, and bound the value loss relative to optimal for using these policies in the decision problem. A numerical example illustrates the methodology.
引用
收藏
页码:162 / 175
页数:14
相关论文
共 22 条
[1]   CONDITIONS FOR THE EXISTENCE OF PLANNING-HORIZONS [J].
BEAN, JC ;
SMITH, RL .
MATHEMATICS OF OPERATIONS RESEARCH, 1984, 9 (03) :391-401
[2]   CONTRACTION MAPPINGS IN THEORY UNDERLYING DYNAMIC PROGRAMMING [J].
DENARDO, EV .
SIAM REVIEW, 1967, 9 (02) :165-&
[3]  
Eaves B, 1984, COURSE TRIANGULATION
[4]  
Eckles J. E., 1966, THESIS STANFORD U ST
[5]   Simplicial analyses of a restricted flatness [J].
Freudenthal, H .
ANNALS OF MATHEMATICS, 1942, 43 :580-582
[6]  
HEYMAN DP, 1984, STOCHASTIC MODELS OP, V2
[7]   A NEW OPTIMALITY CRITERION FOR NONHOMOGENEOUS MARKOV DECISION-PROCESSES [J].
HOPP, WJ ;
BEAN, JC ;
SMITH, RL .
OPERATIONS RESEARCH, 1987, 35 (06) :875-883
[8]  
KAKALIK JS, 1965, MIT18 OP RES CTR TEC
[9]  
LOVEJOY WS, 1988, GSB1003 STANF U RES
[10]   A TEST FOR SUBOPTIMAL ACTIONS IN MARKOVIAN DECISION PROBLEMS [J].
MACQUEEN, J .
OPERATIONS RESEARCH, 1967, 15 (03) :559-&