Branch-and-bound algorithms for computing the best-subset regression models

被引:55
作者
Gatu, C
Kontoghiorghes, EJ
机构
[1] Univ Neuchatel, Dept Comp Sci, CH-2007 Neuchatel, Switzerland
[2] Univ Cyprus, Dept Publ & Business Adm, Nicosia, Cyprus
[3] Univ London Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England
关键词
least squares; QR decomposition; subset regression;
D O I
10.1198/106186006X100290
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An efficient branch-and-bound algorithm for computing the best-subset regression models is proposed. The algorithm avoids the computation of the whole regression tree that generates all possible subset models. It is formally shown that if the branch-and-bound test holds, then the current subtree together with its right-hand side subtrees are cut. This reduces significantly the computational burden of the proposed algorithm when compared to an existing leaps-and-bounds method which generates two trees. Specifically, the proposed algorithm, which is based on orthogonal transformations, outperforms by O(n(3)) the leaps-and-bounds strategy. The criteria used in identifying the best subsets are based on monotone functions of the residual sum of squares (RSS) such as R-2, adjusted R-2, mean square error of prediction, and C-p. Strategies and heuristics that improve the computational performance of the proposed algorithm are investigated. A computationally efficient heuristic version of the branch-and-bound strategy which decides to cut subtrees using a tolerance parameter is proposed. The heuristic algorithm derives models close to the best ones. However, it is shown analytically that the relative error of the RSS, and consequently the corresponding statistic, of the computed subsets is smaller than the value of the tolerance parameter which lies between zero and one. Computational results and experiments on random and real data arc presented and analyzed.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 38 条
[11]  
GATU C, IN PRESS J EC DYNAMI
[12]   Efficient strategies for deriving the subset VAR models [J].
Gatu, Cristian ;
Kontoghiorghes, Erricos John .
COMPUTATIONAL MANAGEMENT SCIENCE, 2005, 2 (04) :253-278
[13]  
Golub G. H., 1996, MATRIX COMPUTATIONS
[14]  
GOODNIGHT JH, 1979, AM STAT, V33, P116
[15]  
Hastie T., 1990, Generalized additive model
[16]   ANALYSIS AND SELECTION OF VARIABLES IN LINEAR-REGRESSION [J].
HOCKING, RR .
BIOMETRICS, 1976, 32 (01) :1-49
[17]   CRITERIA FOR SELECTION OF A SUBSET REGRESSION - WHICH ONE SHOULD BE USED [J].
HOCKING, RR .
TECHNOMETRICS, 1972, 14 (04) :967-&
[18]   SELECTION OF BEST SUBSET IN REGRESSION ANALYSIS [J].
HOCKING, RR ;
LESLIE, RN .
TECHNOMETRICS, 1967, 9 (04) :531-&
[19]   DEVELOPMENTS IN LINEAR-REGRESSION METHODOLOGY - 1959-1982 [J].
HOCKING, RR .
TECHNOMETRICS, 1983, 25 (03) :219-230
[20]   Parallel strategies for computing the orthogonal factorizations used in the estimation of econometric models [J].
Kontoghiorghes, EJ .
ALGORITHMICA, 1999, 25 (01) :58-74