Parallel rollout for online solution of partially observable Markov decision processes

被引:41
作者
Chang, HS [1 ]
Givan, R
Chong, EKP
机构
[1] Sogang Univ, Dept Comp Sci & Engn, Seoul, South Korea
[2] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
[3] Colorado State Univ, Dept Elect & Comp Engn, Ft Collins, CO 80523 USA
来源
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS | 2004年 / 14卷 / 03期
基金
美国国家科学基金会;
关键词
partially observable Markov decision process; rollout; simulation; multiclass scheduling; buffer management;
D O I
10.1023/B:DISC.0000028199.78776.c4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies available such that each policy performs near-optimal for a different set of system paths. Parallel rollout automatically combines the given multiple policies to create a new policy that adapts to the different system paths and improves the performance of each policy in the set. We formally prove this claim for two criteria: total expected reward and infinite horizon discounted reward. The parallel rollout approach also resolves the key issue of selecting which policy to roll out among multiple heuristic policies whose performances cannot be predicted in advance. We present two example problems to illustrate the effectiveness of the parallel rollout approach: a buffer management problem and a multiclass scheduling problem.
引用
收藏
页码:309 / 341
页数:33
相关论文
共 32 条
[1]  
Andersen AT, 1997, IEEE INFOCOM SER, P196
[2]  
ANDERSEN AT, 1995, P 12 NORD TEL SEM ES, P269
[3]  
[Anonymous], STOCHASTIC MODELS
[4]  
Asmussen S, 1996, SCAND J STAT, V23, P419
[5]  
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[6]  
Bertsekas D.P., 1997, P 35 ALL C COMM CONT
[7]   Rollout algorithms for stochastic scheduling problems [J].
Bertsekas, DP ;
Castañon, DA .
JOURNAL OF HEURISTICS, 1999, 5 (01) :89-108
[8]  
Bertsekas DP, 2012, DYNAMIC PROGRAMMING, V2
[9]  
BLONDIA C, 1993, BELGIAN J OPERATIONS, V32
[10]  
Bonald T., 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), P1415, DOI 10.1109/INFCOM.2000.832539