Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

被引:12
作者
Collins, EJ [1 ]
McNamara, JM [1 ]
机构
[1] Univ Bristol, Dept Math, Bristol BS8 1TW, Avon, England
关键词
fluctuating environment; Markov decision processes; dynamic programming;
D O I
10.1017/S0001867800008119
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider a problem similar in many respects to a finite horizon Markov decision process, except that the reward to the individual is a strictly concave functional of the distribution of the state of the individual at final time T. Reward structures such as these are of interest to biologists studying the fitness of different strategies in a fluctuating environment. The problem fails to satisfy the usual optimality equation and cannot be solved directly by dynamic programming. We establish equations characterising the optimal final distribution and an optimal policy pi*. We show that in general pi* will be a Markov randomised policy (or equivalently a mixture of Markov deterministic policies) and we develop an iterative, policy improvement based algorithm which converges to pi*. We also consider an infinite population version of the problem, and show that the population cannot do better using a coordinated policy than by each individual independently following the individual optimal policy pi*.
引用
收藏
页码:122 / 136
页数:15
相关论文
共 17 条
[1]  
BERTSEKAS D. P, 1978, Neuro-dynamic programming
[2]   A NOTE ON MEMORYLESS RULES FOR CONTROLLING SEQUENTIAL CONTROL PROCESSES [J].
DERMAN, C ;
STRAUCH, RE .
ANNALS OF MATHEMATICAL STATISTICS, 1966, 37 (01) :276-&
[3]  
DERMAN C, 1970, FINITE STATE MARKOVI
[4]   OPTIMAL MIXED STRATEGIES IN STOCHASTIC ENVIRONMENTS [J].
HACCOU, P ;
IWASA, Y .
THEORETICAL POPULATION BIOLOGY, 1995, 47 (02) :212-243
[5]  
Kallenberg LCM, 1983, LINEAR PROGRAMMING F
[6]   ON POPULATION GROWTH IN A RANDOMLY VARYING ENVIRONMENT [J].
LEWONTIN, RC ;
COHEN, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1969, 62 (04) :1056-&
[7]  
Luenberger D., 1974, Introduction to Linear and Nonlinear Programming
[8]  
Mangel M., 1988, Dynamic modelling in behavioral ecology
[9]   IMPLICIT FREQUENCY-DEPENDENCE AND KIN SELECTION IN FLUCTUATING ENVIRONMENTS [J].
MCNAMARA, JM .
EVOLUTIONARY ECOLOGY, 1995, 9 (02) :185-203
[10]   DYNAMIC OPTIMIZATION IN FLUCTUATING ENVIRONMENTS [J].
MCNAMARA, JM ;
WEBB, JN ;
COLLINS, EJ .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1995, 261 (1362) :279-284