Efficient solution algorithms for factored MDPs

被引:215
作者
Guestrin, C [1 ]
Koller, D
Parr, R
Venkataraman, S
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Duke Univ, Dept Comp Sci, Durham, NC 27706 USA
[3] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
10.1613/jair.1000
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10(40) states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
引用
收藏
页码:399 / 468
页数:70
相关论文
共 50 条
[1]
[Anonymous], 1998, THESIS MIT
[2]
COMPLEXITY OF FINDING EMBEDDINGS IN A K-TREE [J].
ARNBORG, S ;
CORNEIL, DG ;
PROSKUROWSKI, A .
SIAM JOURNAL ON ALGEBRAIC AND DISCRETE METHODS, 1987, 8 (02) :277-284
[3]
A sufficiently fast algorithm for finding close to optimal clique trees [J].
Becker, A ;
Geiger, D .
ARTIFICIAL INTELLIGENCE, 2001, 125 (1-2) :3-17
[4]
Bellman R., 1957, DYNAMIC PROGRAMMING
[5]
Bellman R., 1963, Mathe- matics of Computation, V17, P155
[6]
Bertele U, 1972, NONSERIAL DYNAMIC PR
[7]
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[8]
Stochastic dynamic programming with factored representations [J].
Boutilier, C ;
Dearden, R ;
Goldszmidt, M .
ARTIFICIAL INTELLIGENCE, 2000, 121 (1-2) :49-107
[9]
Boutilier C, 1999, J ARTIF INTELL RES, V11, P1
[10]
BOUTILIER C, 1996, P 13 INT C MACH LEAR, P54