ON THE CONVERGENCE OF POLICY ITERATION IN FINITE STATE UNDISCOUNTED MARKOV DECISION-PROCESSES - THE UNICHAIN CASE

被引:15
作者
HORDIJK, A [1 ]
PUTERMAN, ML [1 ]
机构
[1] UNIV BRITISH COLUMBIA,FAC COMMERCE & BUSINESS ADM,DIV MANAGEMENT SCI,VANCOUVER V6T 1Y8,BC,CANADA
关键词
MATHEMATICAL PROGRAMMING;
D O I
10.1287/moor.12.1.163
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We study the convergence of policy iteration for the undiscounted, finite state, discrete time Markov decision problem with compact action space and unichain transition structure. Using a 'Newton Method type' representation for policy iteration, we establish the existence of a solution to the optimality equation. We show that to find an average optimal policy, it is sufficient to solve the optimality equation on the recurrent set of the maximizing policy. Under the additional assumption of a unique maximizing policy at each stage of the policy iteration procedure, we show that the iterates are convergent and the resulting policy is Blackwell optimal.
引用
收藏
页码:163 / 176
页数:14
相关论文
共 27 条
[1]  
Bather J., 1973, Advances in Applied Probability, V5, P328, DOI 10.2307/1426039
[2]  
Bather JA, 1973, ADV APPL PROBAB, V5, P541
[3]  
Billingsley P, 1968, CONVERGENCE PROBABIL
[4]  
BLACKWELL D, 1962, ANN MATH STATIST, V35, P719
[5]  
Dekker R., 1985, THESIS U LEIDEN
[6]   COMPUTING A BIAS-OPTIMAL POLICY IN A DISCRETE-TIME MARKOV DECISION PROBLEM [J].
DENARDO, EV .
OPERATIONS RESEARCH, 1970, 18 (02) :279-&
[7]  
DENARDO EV, 1973, MATH PROGRAMMING
[8]   A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISION PROCESSES [J].
DERMAN, C ;
VEINOTT, AF .
ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (02) :582-&
[9]   DENUMERABLE STATE MARKOVIAN DECISION PROCESSES - AVERAGE COST CRITERION [J].
DERMAN, C .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (06) :1545-&
[10]  
Derman C., 1970, FINITE STATE MARKOVI