Branch-and-bound algorithms for computing the best-subset regression models

被引：55

作者：

Gatu, C

Kontoghiorghes, EJ

机构：

[1] Univ Neuchatel, Dept Comp Sci, CH-2007 Neuchatel, Switzerland

[2] Univ Cyprus, Dept Publ & Business Adm, Nicosia, Cyprus

[3] Univ London Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England

来源：

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS | 2006年 / 15卷 / 01期

关键词：

least squares; QR decomposition; subset regression;

D O I：

10.1198/106186006X100290

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

An efficient branch-and-bound algorithm for computing the best-subset regression models is proposed. The algorithm avoids the computation of the whole regression tree that generates all possible subset models. It is formally shown that if the branch-and-bound test holds, then the current subtree together with its right-hand side subtrees are cut. This reduces significantly the computational burden of the proposed algorithm when compared to an existing leaps-and-bounds method which generates two trees. Specifically, the proposed algorithm, which is based on orthogonal transformations, outperforms by O(n(3)) the leaps-and-bounds strategy. The criteria used in identifying the best subsets are based on monotone functions of the residual sum of squares (RSS) such as R-2, adjusted R-2, mean square error of prediction, and C-p. Strategies and heuristics that improve the computational performance of the proposed algorithm are investigated. A computationally efficient heuristic version of the branch-and-bound strategy which decides to cut subtrees using a tolerance parameter is proposed. The heuristic algorithm derives models close to the best ones. However, it is shown analytically that the relative error of the RSS, and consequently the corresponding statistic, of the computed subsets is smaller than the value of the tolerance parameter which lies between zero and one. Computational results and experiments on random and real data arc presented and analyzed.

引用

页码：139 / 156

页数：18

共 38 条

[1] MEAN SQUARE ERROR OF PREDICTION AS A CRITERION FOR SELECTING VARIABLES [J].