On constraint sampling in the linear programming approach to approximate dynamic programming

被引:186
作者
de Farias, DP
Van Roy, B
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Stanford Univ, Stanford, CA 93405 USA
关键词
Markov decision processes; policy; value function; basis functions;
D O I
10.1287/moor.1040.0094
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 [运筹学与控制论]; 12 [管理学]; 1201 [管理科学与工程]; 1202 [工商管理学]; 120202 [企业管理];
摘要
In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program-the ALP-that has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m much less than M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, given an idealized sampling distribution and appropriate constraints on the K variables, m can be chosen independently of M and need grow only as a polynomial in K. We interpret this result in a context involving controlled queueing networks.
引用
收藏
页码:462 / 478
页数:17
相关论文
共 14 条
[1]
ANTHONY D, 1992, COMPUTATIONAL LEARNI, P95
[2]
Bertsekas D., 1996, NEURO DYNAMIC PROGRA, V1st
[3]
CALAFIORE GM, 2003, UNCERTAIN CONVEX PRO
[4]
LAS-VEGAS ALGORITHMS FOR LINEAR AND INTEGER PROGRAMMING WHEN THE DIMENSION IS SMALL [J].
CLARKSON, KL .
JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY, 1995, 42 (02) :488-499
[5]
The linear programming approach to approximate dynamic programming [J].
De Farias, DP ;
Van Roy, B .
OPERATIONS RESEARCH, 2003, 51 (06) :850-865
[6]
CENTRAL LIMIT-THEOREMS FOR EMPIRICAL MEASURES [J].
DUDLEY, RM .
ANNALS OF PROBABILITY, 1978, 6 (06) :899-929
[7]
Efficient solution algorithms for factored MDPs [J].
Guestrin, C ;
Koller, D ;
Parr, R ;
Venkataraman, S .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2003, 19 :399-468
[8]
EQUIVALENCE OF MODELS FOR POLYNOMIAL LEARNABILITY [J].
HAUSSLER, D ;
KEARNS, M ;
LITTLESTONE, N ;
WARMUTH, MK .
INFORMATION AND COMPUTATION, 1991, 95 (02) :129-161
[9]
The CD34+cell concentration in peripheral blood predicts CD34+cell yield in the leukapheresis product [J].
Hollingsworth, KL ;
Zimmerman, TM ;
Karrison, T ;
Oliver, A ;
Williams, SF .
CYTOTHERAPY, 1999, 1 (02) :141-146
[10]
New linear program performance bounds for queueing networks [J].
Morrison, JR ;
Kumar, PR .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1999, 100 (03) :575-597