Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
被引:42
作者:
Tan, YX
论文数: 0引用数: 0
h-index: 0
机构:Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
Tan, YX
Shi, LM
论文数: 0引用数: 0
h-index: 0
机构:Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
Shi, LM
Tong, WD
论文数: 0引用数: 0
h-index: 0
机构:Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
Tong, WD
Wang, C
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USAUniv Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
Wang, C
[1
]
机构:
[1] Univ Calif Los Angeles, Cedars Sinai Med Ctr, Dept Med, David Geffen Sch Med, Los Angeles, CA 90048 USA
[2] US FDA, Natl Ctr Toxicol Res, Ctr Toxicoinformat, Div Syst Toxicol, Jefferson, AR 72079 USA
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)
机构:
The Engineering Academy of Denmark, DIAM, Building 358, Lyngby,2800, DenmarkThe Engineering Academy of Denmark, DIAM, Building 358, Lyngby,2800, Denmark
机构:
The Engineering Academy of Denmark, DIAM, Building 358, Lyngby,2800, DenmarkThe Engineering Academy of Denmark, DIAM, Building 358, Lyngby,2800, Denmark