Prediction by supervised principal components

被引:452
作者
Bair, E
Hastie, T
Paul, D
Tibshirani, R
机构
[1] Univ Calif San Francisco, Dept Neurol, San Francisco, CA 94143 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
gene expression; microarray; regression; survival analysis;
D O I
10.1198/016214505000000628
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that call be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data. where they may be used to more accurately diagnose and treat cancer.
引用
收藏
页码:119 / 137
页数:19
相关论文
共 39 条
  • [1] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [2] [Anonymous], 1979, Multivariate analysis
  • [3] Effective dimension reduction methods for tumor classification using gene expression data
    Antoniadis, A
    Lambert-Lacroix, S
    Leblanc, F
    [J]. BIOINFORMATICS, 2003, 19 (05) : 563 - 570
  • [4] BAIK J, 2004, EIGENVALUES LARGE SA
  • [5] Semi-supervised methods to predict patient survival from gene expression data
    Bair, E
    Tibshirani, R
    [J]. PLOS BIOLOGY, 2004, 2 (04) : 511 - 522
  • [6] Gene-expression profiles predict survival of patients with lung adenocarcinoma
    Beer, DG
    Kardia, SLR
    Huang, CC
    Giordano, TJ
    Levin, AM
    Misek, DE
    Lin, L
    Chen, GA
    Gharib, TG
    Thomas, DG
    Lizyness, ML
    Kuick, R
    Hayasaka, S
    Taylor, JMG
    Iannettoni, MD
    Orringer, MB
    Hanash, S
    [J]. NATURE MEDICINE, 2002, 8 (08) : 816 - 824
  • [7] Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia
    Bullinger, L
    Döhner, K
    Bair, E
    Fröhling, S
    Schlenk, RF
    Tibshirani, R
    Döhner, H
    Pollack, JR
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2004, 350 (16) : 1605 - 1616
  • [8] Graphical methods for class prediction using dimension reduction techniques on DNA microarray data
    Bura, E
    Pfeiffer, RM
    [J]. BIOINFORMATICS, 2003, 19 (10) : 1252 - 1258
  • [9] Atomic decomposition by basis pursuit
    Chen, SSB
    Donoho, DL
    Saunders, MA
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 33 - 61
  • [10] Sufficient dimension reduction in regressions with categorical predictors
    Chiaromonte, F
    Cook, RD
    Li, B
    [J]. ANNALS OF STATISTICS, 2002, 30 (02) : 475 - 497