Sample path optimality for a Markov optimization problem

被引:6
作者
Hunt, FY [1 ]
机构
[1] Natl Inst Stand & Technol, Div Math & Comp Sci, Gaithersburg, MD 20899 USA
关键词
Markov decision process; stochastic control; Azuma's inequality;
D O I
10.1016/j.spa.2004.12.005
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study a unichain Markov decision process i.e. a controlled Markov process whose state process under a stationary policy is an ergodic Markov chain. Here the state and action spaces are assumed to be either finite or countable. When the state process is uniformly ergodic and the immediate cost is bounded then a policy that minimizes the long-term expected average cost also has an nth stage sample path cost that with probability one is asymptotically less than the nth stage sample path cost under any other non-optimal stationary policy with a larger expected average cost. This is a strengthening in the Markov model case of the a.s. asymptotically optimal property frequently discussed in the literature. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:769 / 779
页数:11
相关论文
共 15 条
[1]   MARKOV DECISION-PROBLEMS AND STATE-ACTION FREQUENCIES [J].
ALTMAN, E ;
SHWARTZ, A .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1991, 29 (04) :786-809
[2]  
ALTMAN E, 1999, CONSTRAINED MARKKOV
[3]  
Asriev A. V., 1990, Stochastics and Stochastics Reports, V33, P1, DOI 10.1080/17442509008833660
[4]  
BELKINA T, 2002, OPTIMALITY PROBABILI
[5]  
Bremaud P., 1999, GIBBS FIELDS MONTE C
[6]  
Doob J. L., 1953, Stochastic processes, V101
[7]   Hoeffding's inequality for uniformly ergodic Markov chains [J].
Glynn, PW ;
Ormoneit, D .
STATISTICS & PROBABILITY LETTERS, 2002, 56 (02) :143-146
[8]   AVERAGE OPTIMAL STATIONARY POLICIES AND LINEAR-PROGRAMMING IN COUNTABLE SPACE MARKOV DECISION-PROCESSES [J].
LASSERRE, JB .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1994, 183 (01) :233-249
[9]  
Leizarowitz A., 1988, Stochastics, V23, P85, DOI 10.1080/17442508808833484
[10]  
Meyn SP., 1993, Stochastic Stability of Markov chains