Data-dependent kernel machines for Microarray data classification

被引:29
作者
Xiong, Huilin [1 ]
Zhang, Ya
Chen, Xue-Wen
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
关键词
microarray data analysis; cancer classification; kernel machines; kernel optimization; bootstrapping resampling;
D O I
10.1109/TCBB.2007.1048
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
One important application of gene expression analysis is to classify tissue samples according to their gene expression levels. Gene expression data are typically characterized by high dimensionality and small sample size, which makes the classification task quite challenging. In this paper, we present a data-dependent kernel for microarray data classification. This kernel function is engineered so that the class separability of the training data is maximized. A bootstrapping-based resampling scheme is introduced to reduce the possible training bias. The effectiveness of this adaptive kernel for microarray data classification is illustrated with a k-Nearest Neighbor (KNN) classifier. Our experimental study shows that the data-dependent kernel leads to a significant improvement in the accuracy of KNN classifiers. Furthermore, this kernel-based KNN scheme has been demonstrated to be competitive to, if not better than, more sophisticated classifiers such as Support Vector Machines (SVMs) and the Uncorrelated Linear Discriminant Analysis (ULDA) for classifying gene expression data.
引用
收藏
页码:583 / 595
页数:13
相关论文
共 48 条
[1]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]
Improving support vector machine classifiers by modifying kernel functions [J].
Amari, S ;
Wu, S .
NEURAL NETWORKS, 1999, 12 (06) :783-789
[3]
[Anonymous], P AAAI FALL S REL
[4]
Tissue classification with gene expression profiles [J].
Ben-Dor, A ;
Bruhn, L ;
Friedman, N ;
Nachman, I ;
Schummer, M ;
Yakhini, Z .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :559-583
[5]
Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[6]
Borgwardt Karsten M, 2006, Pac Symp Biocomput, P547, DOI 10.1142/9789812701626_0051
[7]
A computational approach to identify genes for functional RNAs in genomic sequences [J].
Carter, RJ ;
Dubchak, I ;
Holbrook, SR .
NUCLEIC ACIDS RESEARCH, 2001, 29 (19) :3928-3938
[8]
Cawley G.C., 2000, MATLAB SUPPORT VECTO
[9]
Feature subset selection for splice site prediction [J].
Degroeve, S ;
De Baets, B ;
Van de Peer, Y ;
Rouzé, P .
BIOINFORMATICS, 2002, 18 :S75-S83
[10]
Boosting for tumor classification with gene expression data [J].
Dettling, M ;
Bühlmann, P .
BIOINFORMATICS, 2003, 19 (09) :1061-1069