Support vector machine classification and validation of cancer tissue samples using microarray expression data

被引:1657
作者
Furey, TS [1 ]
Cristianini, N
Duffy, N
Bednarski, DW
Schummer, M
Haussler, D
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Univ Bristol, Dept Engn Math, Bristol BS8 1TH, Avon, England
[3] Univ Washington, Dept Mol Biotechnol, Seattle, WA 98195 USA
关键词
D O I
10.1093/bioinformatics/16.10.906
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mis-labeled or questionable tissue results. Results: We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97 802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled Upon correction of this mistake and the removal of an outlier perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets.
引用
收藏
页码:906 / 914
页数:9
相关论文
共 33 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]  
BENDOR A, 2000, P 4 ANN INT C COMP M
[3]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[4]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   The transcriptional program of sporulation in budding yeast [J].
Chu, S ;
DeRisi, J ;
Eisen, M ;
Mulholland, J ;
Botstein, D ;
Brown, PO ;
Herskowitz, I .
SCIENCE, 1998, 282 (5389) :699-705
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]  
Cristianini N, 2000, Intelligent Data Analysis: An Introduction
[9]  
DeRisi J, 1996, NAT GENET, V14, P457
[10]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686