Knowledge-based analysis of microarray gene expression data by using support vector machines

被引:1495
作者
Brown, MPS
Grundy, WN
Lin, D
Cristianini, N
Sugnet, CW
Furey, TS
Ares, M
Haussler, D
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Ctr Mol Biol RNA, Dept Biol, Santa Cruz, CA 95064 USA
[3] Columbia Univ, Dept Comp Sci, New York, NY 10025 USA
[4] Univ Bristol, Dept Engn Math, Bristol BS8 1TR, Avon, England
关键词
D O I
10.1073/pnas.97.1.262
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.
引用
收藏
页码:262 / 267
页数:6
相关论文
共 32 条
  • [31] Structure and evolution of mammalian ribosomal proteins
    Wool, IG
    Chan, YL
    Gluck, A
    [J]. BIOCHEMISTRY AND CELL BIOLOGY-BIOCHIMIE ET BIOLOGIE CELLULAIRE, 1995, 73 (11-12): : 933 - 947
  • [32] WU D, 1999, ICML99