Rollout Algorithms for Combinatorial Optimization

被引：153

作者：

Bertsekas D.P. ^{[1
]}

Tsitsiklis J.N. ^{[1
]}

Wu C. ^{[1
]}

机构：

[1] Dept. of Elec. Eng. and Comp. Sci., M.I.T., Cambridge

来源：

Journal of Heuristics | 1997年 / 3卷 / 3期

基金：

美国国家科学基金会;

关键词：

Destination Node; Heuristic Algorithm; Spare Part; Typical Step; Policy Iteration;

D O I：

10.1023/A:1009635226865

中图分类号：

学科分类号：

摘要：

We consider the approximate solution of discrete optimization problems using procedures that are capable of magnifying the effectiveness of any given heuristic algorithm through sequential application. In particular, we embed the problem within a dynamic programming framework, and we introduce several types of rollout algorithms, which are related to notions of policy iteration. We provide conditions guaranteeing that the rollout algorithm improves the performance of the original heuristic algorithm. The method is illustrated in the context of a machine maintenance and repair problem.

引用

页码：245 / 262

页数：17

共 6 条

[1] Barto A.G., Bradtke S.J., Singh S.P., Learning to Act Using Real-Time Dynamic Programming, Artificial Intelligence, 72, pp. 81-138, (1995)
[2] Barto A.S., Sutton R., Reinforcement Learning, (1997)
[3] Bertsekas D.P., Tsitsiklis J.N., Neuro-Dynamic Programming, (1996)
[4] Glover F., Taillard E., De Werra D., A User's Guide to Tabu Search, Annals of Operations Research, 41, pp. 3-28, (1993)
[5] Pattipati K.R., Alexandridis M.G., Application of Heuristic Search and Information Theory to Sequential Fault Diagnosis, IEEE Transactions on Systems, Man, and Cybernetics, 20, pp. 872-887, (1990)
[6] Tesauro G., Galperin G.R., On-Line Policy Improvement Using Monte Carlo Search, (1996)

← 1 →