A new value iteration method for the average cost dynamic programming problem

被引：25

作者：

Bertsekas, DP ^{[1
]}

机构：

[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 1998年 / 36卷 / 02期

关键词：

dynamic programming; average cost; value iteration;

D O I：

10.1137/S0363012995291609

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a new value iteration method for the classical average cost Markovian decision problem, under the assumption that all stationary policies are unichain and that, furthermore, there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup-norm contraction, and for this reason it admits a Gauss-Seidel implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems.

引用

页码：742 / 759

页数：18

共 12 条

[1]

ABOUNADI J, 1997, UNPUB Q LEARNING ALG

[2]

[Anonymous], 1963, J MATH ANAL APPL

[3]

Bertsekas D. P., 2005, Dynamic programming and optimal control, V1

[4]

Bertsekas Dimitri P., 1989, PARALLEL DISTRIBUTED

[5]

Bertsekas DP, 1995, Dynamic Programming and Optimal Control, V2

[6] ON FINDING MAXIMAL GAIN FOR MARKOV DECISION PROCESSES [J].

ODONI, AR .

OPERATIONS RESEARCH, 1969, 17 (05) :857-&

[7] IMPROVED CONDITIONS FOR CONVERGENCE IN UNDISCOUNTED MARKOV RENEWAL PROGRAMMING [J].

PLATZMAN, L .

OPERATIONS RESEARCH, 1977, 25 (03) :529-533

[8]

POPYACK JL, 1969, IEEE T AUTOMAT CONTR, V24, P503

[9]

Puterman M L., 1994, MARKOVIAN DECISION P

[10] ITERATIVE SOLUTION OF FUNCTIONAL EQUATIONS OF UNDISCOUNTED MARKOV RENEWAL PROGRAMMING [J].

SCHWEITZER, PJ .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1971, 34 (03) :495-+

← 1 2 →