Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis

被引：46

作者：

Pan, F ^{[1
]}

Wang, BY

Hu, X

Perrizo, W

机构：

[1] N Dakota State Univ, Dept Comp Sci, Fargo, ND 58105 USA

[2] Rockefeller Univ, Lab Struct Microbiol, New York, NY 10021 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2004年 / 37卷 / 04期

关键词：

data mining; k-nearest neighbor; support vector machine; feature selection; P-tree; gene expression; machine learning;

D O I：

10.1016/j.jbi.2004.07.003

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Classification analysis of microarray gene expression data has been widely used to uncover biological features and to distinguish closely related cell types that often appear in the diagnosis of cancer. However, the number of dimensions of gene expression data is often very high, e.g., in the hundreds or thousands. Accurate and efficient classification of such high-dimensional data remains a contemporary challenge. In this paper, we propose a comprehensive vertical sample-based KNN/LSVM classification approach with weights optimized by genetic algorithms for high-dimensional data. Experiments on common gene expression datasets demonstrated that our approach can achieve high accuracy and efficiency at the same time. The improvement of speed is mainly related to the vertical data representation, P-tree,(1) and its optimized logical algebra. The high accuracy is due to the combination of a KNN majority voting approach and a local support vector machine approach that makes optimal decisions at the local level. As a result, our approach could be a powerful tool for high-dimensional gene expression data analysis. (C) 2004 Elsevier Inc. All rights reserved.

引用

页码：240 / 248

页数：9

共 22 条

[1]

AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759

[2] Knowledge-based analysis of microarray gene expression data by using support vector machines [J].