Eigengene-based linear discriminant model for tumor classification using gene expression microarray data

被引:34
作者
Shen, Ronglai
Ghosh, Debashis
Chinnaiyan, Arul
Meng, Zhaoling
机构
[1] Sanofi Aventis, Biostat & Programming, Bridgewater, NJ 08807 USA
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[3] Univ Michigan, Dept Pathol & Urol, Ann Arbor, MI 48109 USA
关键词
D O I
10.1093/bioinformatics/btl442
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The nearest shrunken centroids classifier has become a popular algorithm in tumor classification problems using gene expression microarray data. Feature selection is an embedded part of the method to select top-ranking genes based on a univariate distance statistic calculated for each gene individually. The univariate statistics summarize gene expression profiles outside of the gene co-regulation network context, leading to redundant information being included in the selection procedure. Results: We propose an Eigengene-based Linear Discriminant Analysis (ELDA) to address gene selection in a multivariate framework. The algorithm uses a modified rotated Spectral Decomposition (SpD) technique to select 'hub' genes that associate with the most important eigenvectors. Using three benchmark cancer microarray datasets, we show that ELDA selects the most characteristic genes, leading to substantially smaller classifiers than the univariate feature selection based analogues. The resulting de-correlated expression profiles make the gene-wise independence assumption more realistic and applicable for the shrunken centroids classifier and other diagonal linear discriminant type of models. Our algorithm further incorporates a misclassification cost matrix, allowing differential penalization of one type of error over another. In the breast cancer data, we show false negative prognosis can be controlled via a cost-adjusted discriminant function.
引用
收藏
页码:2635 / 2642
页数:8
相关论文
共 28 条
[1]   Identification of endothelial cell genes expressed in an in vitro model of angiogenesis:: Induction of ESM-1, βig-h3, and NrCAM [J].
Aitkenhead, M ;
Wang, SJ ;
Nakatsu, MN ;
Mestas, J ;
Heard, C ;
Hughes, CCW .
MICROVASCULAR RESEARCH, 2002, 63 (02) :159-171
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[4]  
[Anonymous], 1999, APPL MULTIVARIATE AN
[5]  
BOOTH BA, 1995, GROWTH REGULAT, V5, P1
[6]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[7]   Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival [J].
Chang, HY ;
Nuyten, DSA ;
Sneddon, JB ;
Hastie, T ;
Tibshirani, R ;
Sorlie, T ;
Dai, HY ;
He, YDD ;
van't Veer, LJ ;
Bartelink, H ;
van de Rijn, M ;
Brown, PO ;
van de Vijver, MJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (10) :3738-3743
[8]  
DABNEY A, 2005, 267 UW UW BIOST
[9]   Classification of microarrays to nearest centroids [J].
Dabney, AR .
BIOINFORMATICS, 2005, 21 (22) :4148-4154
[10]   A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients [J].
Dai, HY ;
van't Veer, L ;
Lamb, J ;
He, YD ;
Mao, M ;
Fine, BM ;
Bernards, R ;
de Vijver, MV ;
Deutsch, P ;
Sachs, A ;
Stoughton, R ;
Friend, S .
CANCER RESEARCH, 2005, 65 (10) :4059-4066