Multi-class cancer classification via partial least squares with gene expression profiles

被引:184
作者
Nguyen, DV [1 ]
Rocke, DM
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Univ Calif Davis, Dept Appl Sci, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/18.9.1216
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: Discrimination between two classes such as normal and cancer samples and between two types of cancers based on gene expression profiles is an important problem which has practical implications as well as the potential to further our understanding of gene expression of various cancer cells. Classification or discrimination of more than two groups or classes (multi-class) is also needed. The need for multi-class discrimination methodologies is apparent in many microarray experiments where various cancer types are considered simultaneously. Results: Thus, in this paper we present the extension to the classification methodology proposed earlier Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50) to classify cancer samples from multiple classes. The methodologies proposed in this paper are applied to four gene expression data sets with multiple classes: (a) a hereditary breast cancer data set with (1) BRCA1-mutation, (2) BRCA2-mutation and (3) sporadic breast cancer samples, (b) an acute leukemia data set with (1) acute myeloid leukemia (AML), (2) T-cell acute lymphoblastic leukemia (T-ALL) and (3) B-cell acute lymphoblastic leukemia (B-ALL) samples, (c) a lymphoma data set with (1) diffuse large B-cell lymphoma (DLBCL), (2) B-cell chronic lymphocytic leukemia (BCLL) and (3) follicular lymphoma (FL) samples, and (d) the NCI60 data set with cell lines derived from cancers of various sites of origin. In addition, we evaluated the classification algorithms and examined the variability of the error rates using simulations based on randomization of the real data sets. We note that there are other methods for addressing multi-class prediction recently and our approach is along the line of Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50).
引用
收藏
页码:1216 / 1226
页数:11
相关论文
共 32 条
[1]
ALBERT A, 1984, BIOMETRIKA, V71, P1
[2]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[4]
PLS regression methods [J].
Höskuldsson, Agnar .
Journal of Chemometrics, 1988, 2 (03) :211-228
[5]
Methylation of the BRCA1 promoter region in sporadic breast and ovarian cancer:: correlation with disease characteristics [J].
Catteau, A ;
Harris, WH ;
Xu, CF ;
Solomon, E .
ONCOGENE, 1999, 18 (11) :1957-1965
[6]
DUDOIT S, 2000, 576 U CAL DEP STAT
[7]
Promoter hypermethylation and BRCA1 inactivation in sporadic breast and ovarian tumors [J].
Esteller, M ;
Silva, JM ;
Dominguez, G ;
Bonilla, F ;
Matias-Guiu, X ;
Lerma, E ;
Bussaglia, E ;
Prat, J ;
Harkes, IC ;
Repasky, EA ;
Gabrielson, E ;
Schutte, M ;
Baylin, SB ;
Herman, JG .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2000, 92 (07) :564-569
[8]
Flury B., 1997, 1 COURSE MULTIVARIAT
[9]
Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[10]
AN INTERPRETATION OF PARTIAL LEAST-SQUARES [J].
GARTHWAITE, PH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (425) :122-127