BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS

被引：433

作者：

Bertsimas, Dimitris ^{[1
,2
]}

King, Angela ^{[2
]}

Mazumder, Rahul ^{[1
,2
]}

机构：

[1] MIT, MIT Sloan Sch Management, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[2] MIT, Operat Res Ctr, 77 Massachusetts Ave, Cambridge, MA 02139 USA

来源：

ANNALS OF STATISTICS | 2016年 / 44卷 / 02期

关键词：

Sparse linear regression; best subset selection; l(0)-constrained minimization; lasso; least absolute deviation; algorithms; mixed integer programming; global optimization; discrete optimization; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; REGRESSION SHRINKAGE; LASSO; SPARSITY; RECOVERY; PERSISTENCE;

D O I：

10.1214/15-AOS1388

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In the period 1991-2015, algorithmic advances in Mixed Integer Optimization (MIO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MIO problems. We present a MIO approach for solving the classical best subset selection problem of choosing k out of p features in linear regression given n observations. We develop a discrete extension of modern first-order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm ( a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with n in the 1000s and p in the 100s in minutes to provable optimality, and finds near optimal solutions for n in the 100s and p in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than Lasso and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

引用

页码：813 / 852

页数：40

共 63 条

[1]

[Anonymous], EURO INFORMS

[2]

[Anonymous], 2013, Advances in Neural Information Processing Systems

[3] Certifying the Restricted Isometry Property is Hard [J].

Bandeira, Afonso S. ;

Dobriban, Edgar ;

Mixon, Dustin G. ;

Sawin, William F. .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (06) :3448-3450

[4]

BERTSIMAS D., 2015, BEST SUBSET SELECT S, DOI [10.1214/15-AOS1388SUPP, DOI 10.1214/15-AOS1388SUPP]

[5]

Bertsimas D., 2005, OPTIMIZATION INTEGER

[6] LEAST QUANTILE REGRESSION VIA MODERN OPTIMIZATION [J].

Bertsimas, Dimitris ;

Mazumder, Rahul .

ANNALS OF STATISTICS, 2014, 42 (06) :2494-2525

[7] Algorithm for cardinality-constrained quadratic optimization [J].

Bertsimas, Dimitris ;

Shioda, Romy .

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2009, 43 (01) :1-22

[8] SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR [J].

Bickel, Peter J. ;

Ritov, Ya'acov ;

Tsybakov, Alexandre B. .

ANNALS OF STATISTICS, 2009, 37 (04) :1705-1732

[9] Computational study of a family of mixed-integer quadratic programming problems [J].

Bienstock, D .

MATHEMATICAL PROGRAMMING, 1996, 74 (02) :121-140

[10]

Bixby R., 2012, Doumenta Mathematica. Extra Volume: Optimization Stories, P107

← 1 2 3 4 5 6 7 →