Effective dimension reduction methods for tumor classification using gene expression data

被引:122
作者
Antoniadis, A [1 ]
Lambert-Lacroix, S [1 ]
Leblanc, F [1 ]
机构
[1] Univ Grenoble 1, Lab IMAG LMC, F-38041 Grenoble 9, France
关键词
D O I
10.1093/bioinformatics/btg062
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: One particular application of microarray data, is to uncover the molecular variation among cancers. One feature of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in the thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. An efficient way to solve this problem is by using dimension reduction statistical techniques in conjunction with nonparametric discriminant procedures. Results: We view the classification problem as a regression problem with few observations and many predictor variables. We use an adaptive dimension reduction method for generalized semi-parametric regression models that allows us to solve the 'curse of dimensionality problem' arising in the context of expression data. The predictive performance of the resulting classification rule is illustrated on two well know data sets in the microarray literature: the leukemia data that is known to contain classes that are easy 'separable' and the colon data set.
引用
收藏
页码:563 / 570
页数:8
相关论文
共 21 条
  • [1] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [2] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [3] Generalized partially linear single-index models
    Carroll, RJ
    Fan, JQ
    Gijbels, I
    Wand, MP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) : 477 - 489
  • [4] CHENG B, 1992, J ROY STAT SOC B MET, V54, P427
  • [5] Cook D.R., 1999, APPL REGRESSION INCL
  • [6] ON THE INTERPRETATION OF REGRESSION PLOTS
    COOK, RD
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (425) : 177 - 189
  • [7] Dimension reduction in binary response regression
    Cook, RD
    Lee, H
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (448) : 1187 - 1200
  • [8] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [9] PROJECTION PURSUIT REGRESSION
    FRIEDMAN, JH
    STUETZLE, W
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (376) : 817 - 823
  • [10] Support vector machine classification and validation of cancer tissue samples using microarray expression data
    Furey, TS
    Cristianini, N
    Duffy, N
    Bednarski, DW
    Schummer, M
    Haussler, D
    [J]. BIOINFORMATICS, 2000, 16 (10) : 906 - 914