A new value iteration method for the average cost dynamic programming problem

被引:25
作者
Bertsekas, DP [1 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
关键词
dynamic programming; average cost; value iteration;
D O I
10.1137/S0363012995291609
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup-norm contraction, and for this reason it admits a Gauss-Seidel implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems.
引用
收藏
页码:742 / 759
页数:18
相关论文
共 12 条
[1]  
ABOUNADI J, 1997, UNPUB Q LEARNING ALG
[2]  
[Anonymous], 1963, J MATH ANAL APPL
[3]  
Bertsekas D. P., 2005, Dynamic programming and optimal control, V1
[4]  
Bertsekas Dimitri P., 1989, PARALLEL DISTRIBUTED
[5]  
Bertsekas DP, 1995, Dynamic Programming and Optimal Control, V2
[6]   ON FINDING MAXIMAL GAIN FOR MARKOV DECISION PROCESSES [J].
ODONI, AR .
OPERATIONS RESEARCH, 1969, 17 (05) :857-&
[7]   IMPROVED CONDITIONS FOR CONVERGENCE IN UNDISCOUNTED MARKOV RENEWAL PROGRAMMING [J].
PLATZMAN, L .
OPERATIONS RESEARCH, 1977, 25 (03) :529-533
[8]  
POPYACK JL, 1969, IEEE T AUTOMAT CONTR, V24, P503
[9]  
Puterman M L., 1994, MARKOVIAN DECISION P