Principled sure independence screening for Cox models with ultra-high-dimensional covariates

被引:171
作者
Zhao, Sihai Dave [1 ]
Li, Yi [1 ]
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
关键词
Cox model; Multiple myeloma; Sure independence screening; Ultra-high-dimensional covariates; Variable selection; NONCONCAVE PENALIZED LIKELIHOOD; FALSE DISCOVERY RATE; VARIABLE SELECTION; GENE-EXPRESSION; MULTIPLE-MYELOMA; ADAPTIVE LASSO; REGRESSION; SHRINKAGE;
D O I
10.1016/j.jmva.2011.08.002
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
It is rather challenging for current variable selectors to handle situations where the number of covariates under consideration is ultra-high. Consider a motivating clinical trial of the drug bortezomib for the treatment of multiple myeloma, where overall survival and expression levels of 44760 probesets were measured for each of 80 patients with the goal of identifying genes that predict survival after treatment. This dataset defies analysis even with regularized regression. Some remedies have been proposed for the linear model and for generalized linear models, but there are few solutions in the survival setting and, to our knowledge, no theoretical support. Furthermore, existing strategies often involve tuning parameters that are difficult to interpret. In this paper, we propose and theoretically justify a principled method for reducing dimensionality in the analysis of censored data by selecting only the important covariates. Our procedure involves a tuning parameter that has a simple interpretation as the desired false positive rate of this selection. We present simulation results and apply the proposed procedure to analyze the aforementioned myeloma study. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:397 / 411
页数:15
相关论文
共 37 条
  • [1] Gene prioritization through genomic data fusion
    Aerts, S
    Lambrechts, D
    Maity, S
    Van Loo, P
    Coessens, B
    De Smet, F
    Tranchevent, LC
    De Moor, B
    Marynen, P
    Hassan, B
    Carmeliet, P
    Moreau, Y
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (05) : 537 - 544
  • [2] Benjamini Y, 2001, ANN STAT, V29, P1165
  • [3] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [4] Consistent variable selection in high dimensional regression via multiple testing
    Bunea, Florentina
    Wegkamp, Marten H.
    Auguste, Anna
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2006, 136 (12) : 4349 - 4364
  • [5] Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
  • [6] COX DR, 1972, J R STAT SOC B, V34, P187
  • [7] Prediction of survival in multiple myeloma based on gene expression profiles reveals cell cycle and chromosomal instability signatures in high-risk patients and hyperdiploid signatures in low-risk patients:: A study of the intergroupe francophone du myelome
    Decaux, Olivier
    Lode, Laurence
    Magrangeas, Florence
    Charbonnel, Catherine
    Gouraud, Wilfried
    Jezequel, Pascal
    Attal, Michel
    Harousseau, Jean-Luc
    Moreau, Philippe
    Bataille, Regis
    Campion, Loic
    Avert-Loiseau, Herve
    Minvielle, Stephane
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2008, 26 (29) : 4798 - 4805
  • [8] BOUNDS ON MOMENTS OF MARTINGALES
    DHARMADHIKARI, SW
    FABIAN, V
    JOGDEO, K
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1968, 39 (05): : 1719 - +
  • [9] Fan, 2010, IMS COLLECTIONS, V6, P70, DOI DOI 10.1214/10-IMSC0LL606
  • [10] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883