Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

被引:42
作者
Tan, YX
Shi, LM
Tong, WD
Wang, C [1 ]
机构
[1] Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
[2] US FDA, Natl Ctr Toxicol Res, Ctr Toxicoinformat, Div Syst Toxicol, Jefferson, AR 72079 USA
关键词
D O I
10.1093/nar/gki144
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)
引用
收藏
页码:56 / 65
页数:10
相关论文
共 61 条
  • [1] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [2] ESTIMATING LINEAR STATISTICAL RELATIONSHIPS
    ANDERSON, TW
    [J]. ANNALS OF STATISTICS, 1984, 12 (01) : 1 - 45
  • [3] ANDERSON TW, 1976, J ROY STAT SOC B MET, V38, P1
  • [4] PLS regression methods
    Höskuldsson, Agnar
    [J]. Journal of Chemometrics, 1988, 2 (03) : 211 - 228
  • [5] PCA disjoint models for multiclass cancer analysis using gene expression data
    Bicciato, S
    Luchini, A
    Di Bello, C
    [J]. BIOINFORMATICS, 2003, 19 (05) : 571 - 578
  • [6] Gene expression data analysis
    Brazma, A
    Vilo, J
    [J]. FEBS LETTERS, 2000, 480 (01) : 17 - 24
  • [7] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [8] Exploring the new world of the genome with DNA microarrays
    Brown, PO
    Botstein, D
    [J]. NATURE GENETICS, 1999, 21 (Suppl 1) : 33 - 37
  • [9] Latent variable multivariate regression modeling
    Burnham, AJ
    MacGregor, JF
    Viveros, R
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1999, 48 (02) : 167 - 180
  • [10] Interpretation of regression coefficients under a latent variable regression model
    Burnham, AJ
    MacGregor, JF
    Viveros, R
    [J]. JOURNAL OF CHEMOMETRICS, 2001, 15 (04) : 265 - 284