Boosting for high-dimensional linear models

被引:264
作者
Buhlmann, Peter [1 ]
机构
[1] Swiss Fed Inst Technol, CH-8092 Zurich, Switzerland
关键词
binary classification; gene expression; Lasso; matching pursuit; over-complete dictionary; sparsity; variable selection; weak greedy algorithm;
D O I
10.1214/009053606000000092
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We prove that boosting with the squared error loss, L(2)Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the l(1)-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the l(1)-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L(2)Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L(2)Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.
引用
收藏
页码:559 / 583
页数:25
相关论文
共 26 条
  • [1] Prediction games and arcing algorithms
    Breiman, L
    [J]. NEURAL COMPUTATION, 1999, 11 (07) : 1493 - 1517
  • [2] Breiman L, 1998, ANN STAT, V26, P801
  • [3] Boosting with the L2 loss:: Regression and classification
    Bühlmann, P
    Yu, B
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) : 324 - 339
  • [4] BUHLMANN P, 2005, IN PRESS J MACHINE L
  • [5] Atomic decomposition by basis pursuit
    Chen, SSB
    Donoho, DL
    Saunders, MA
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 33 - 61
  • [6] CRAN, 1997, COMPREHENSIVE R ARCH
  • [7] Finding predictive gene groups from microarray data
    Dettling, M
    Bühlmann, P
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) : 106 - 131
  • [8] Devroye L., 1996, A probabilistic theory of pattern recognition, DOI DOI 10.1007/978-1-4612-0711-5
  • [9] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [10] Least angle regression - Rejoinder
    Efron, B
    Hastie, T
    Johnstone, I
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2004, 32 (02) : 494 - 499