A probabilistic analysis of bias optimality in unichain Markov decision processes

被引:24
作者
Lewis, ME [1 ]
Puterman, ML
机构
[1] Univ Michigan, Dept Ind & Operat Engn, Ann Arbor, MI 48109 USA
[2] Univ British Columbia, Fac Commerce & Business Adm, Vancouver, BC V6T 1Z2, Canada
基金
美国国家科学基金会;
关键词
dynamic programming; Markov processes; optimal control; queueing analysis;
D O I
10.1109/9.898698
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper focuses on bias optimality in unichain, finite state, and action-space Markov decision processes, Using relative value functions, we present new methods for evaluating optimal bias. This leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.
引用
收藏
页码:96 / 100
页数:5
相关论文
共 14 条
[1]   COMPUTING A BIAS-OPTIMAL POLICY IN A DISCRETE-TIME MARKOV DECISION PROBLEM [J].
DENARDO, EV .
OPERATIONS RESEARCH, 1970, 18 (02) :279-&
[2]  
DERMAN C, 1967, ANN MATH STAT, V38
[3]   Bias optimality in controlled queueing systems [J].
Haviv, M ;
Puterman, ML .
JOURNAL OF APPLIED PROBABILITY, 1998, 35 (01) :136-150
[4]  
HEMANDEZLERMA O, 1999, FURTHER TOPICS DISCR
[5]   Bias optimality in a queue with admission control [J].
Lewis, ME ;
Ayhan, H ;
Foley, RD .
PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 1999, 13 (03) :309-327
[6]   A note on bias optimality in controlled queueing systems [J].
Lewis, ME ;
Puterman, ML .
JOURNAL OF APPLIED PROBABILITY, 2000, 37 (01) :300-305
[7]   APPLYING A NEW DEVICE IN OPTIMIZATION OF EXPONENTIAL QUEUING SYSTEMS [J].
LIPPMAN, SA .
OPERATIONS RESEARCH, 1975, 23 (04) :687-710
[8]  
MAKOWSKI AM, 1994, POISSON EQUATION MAR
[9]  
Mann E., 1985, OPTIM J MATH PROGRAM, V16, P767
[10]   The policy iteration algorithm for average reward Markov decision processes with general state space [J].
Meyn, SP .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1997, 42 (12) :1663-1680