Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles

被引:82
作者
Valentini, G
机构
[1] Univ Genoa, Dipartimento Informat & Sci Informazione, I-16146 Genoa, Italy
[2] INFM, I-16146 Genoa, Italy
关键词
gene expression data analysis; output coding ensembles of learning machines; support vector machines; DNA microarrays;
D O I
10.1016/S0933-3657(02)00077-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
The large amount of data generated by DNA microarrays was originally analysed using unsupervised methods, such as clustering or self-organizing maps. Recently supervised methods such as decision trees, dot-product support vector machines (SVM) and multi-layer perceptrons (MLP) have been applied in order to classify normal and tumoural tissues. We propose methods based on non-linear SVM with polynomial and Gaussian kernels, and output coding (OC) ensembles of learning machines to separate normal from malignant tissues, to classify different types of lymphoma and to analyse the role of sets of coordinately expressed genes in carcinogenic processes of lymphoid tissues. Using gene expression data from "Lymphochip", a specialised DNA microarray developed at Stanford University School of Medicine, we show that SVM can correctly separate normal from tumoural tissues, and OC ensembles can be successfully used to classify different types of lymphoma. Moreover, we identify a group of coordinately expressed genes related to the separation of two distinct subgroups inside diffuse large B-cell lymphoma (DLBCL), validating a previous Alizadeh's hypothesis about the existence of two distinct diseases inside DLBCL. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:281 / 304
页数:24
相关论文
共 52 条
[1]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]
EFFICIENT CLASSIFICATION FOR MULTICLASS PROBLEMS USING MODULAR NEURAL NETWORKS [J].
ANAND, R ;
MEHROTRA, K ;
MOHAN, CK ;
RANKA, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (01) :117-124
[4]
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[5]
[Anonymous], 1990, SUPPORT VECTOR LEARN
[6]
A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes [J].
Baldi, P ;
Long, AD .
BIOINFORMATICS, 2001, 17 (06) :509-519
[7]
Clustering gene expression patterns [J].
Ben-Dor, A ;
Shamir, R ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) :281-297
[8]
BENDOR A, 2000, P 4 INT C COMP MOL B
[9]
Bose R. C., 1960, INFORM CONTR, V3, P68, DOI DOI 10.1016/S0019-9958(60)90287-4
[10]
SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE [J].
BREIMAN, L ;
SPECTOR, P .
INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) :291-319