Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence

被引:134
作者
Cai, YD
Lin, SL
机构
[1] Chinese Acad Sci, Shaghai Res Ctr Biotechnol, Shanghai 200233, Peoples R China
[2] Wyeth Ayerst Res, Pearl River, NY 10965 USA
来源
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS | 2003年 / 1648卷 / 1-2期
关键词
classification; feature vector; function; functional genomic; machine learning; prediction; pseudo-amino acid composition; support vector machine; SVM;
D O I
10.1016/S1570-9639(03)00112-2
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Classification of gene function remains one of the most important and demanding tasks in the post-genome era. Most of the current predictive computer methods rely on comparing features that are essentially linear to the protein sequence. However, features of a protein nonlinear to the sequence may also be predictive to its function. Machine learning methods, for instance the Support Vector Machines (SVMs), are particularly suitable for exploiting such features. In this work we introduce SVM and the pseudo-amino acid composition, a collection of nonlinear features extractable from protein sequence, to the field of protein function prediction. We have developed prototype SVMs for binary classification of rRNA-, RNA-, and DNA-binding proteins. Using a protein's amino acid composition and limited range correlation of hydrophobicity and solvent accessible surface area as input, each of the SVMs predicts whether the protein belongs to one of the three classes. In self-consistency and cross-validation tests, which measures the success of learning and prediction, respectively, the rRNA-binding SVM has consistently achieved >95% accuracy. The RNA- and DNA-binding SVMs demonstrate more diverse accuracy, ranging from similar to 76% to similar to 97%. Analysis of the test results suggests the directions of improving the SVMs. (C) 2003 Elsevier Science B.V. All rights reserved..
引用
收藏
页码:127 / 133
页数:7
相关论文
共 18 条
[1]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[2]  
BURBIDGE R, 2000, P AISB 00 S ART INT, P1
[3]   Support vector machines for predicting HIV protease cleavage sites in protein [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2002, 23 (02) :267-274
[4]   Support Vector Machines for predicting protein structural class [J].
Cai, Yu-Dong ;
Liu, Xiao-Jun ;
Xu, Xue-biao ;
Zhou, Guo-Ping .
BMC BIOINFORMATICS, 2001, 2 (1)
[5]  
Cai Yu-Dong, 2000, Molecular Cell Biology Research Communications, V4, P230, DOI 10.1006/mcbr.2001.0285
[6]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[7]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255
[8]   Using pair-coupled amino acid composition to predict protein secondary structure content [J].
Chou, KC .
JOURNAL OF PROTEIN CHEMISTRY, 1999, 18 (04) :473-480
[9]   Multi-class protein fold recognition using support vector machines and neural networks [J].
Ding, CHQ ;
Dubchak, I .
BIOINFORMATICS, 2001, 17 (04) :349-358
[10]   The genome of M-acetivorans reveals extensive metabolic and physiological diversity [J].
Galagan, JE ;
Nusbaum, C ;
Roy, A ;
Endrizzi, MG ;
Macdonald, P ;
FitzHugh, W ;
Calvo, S ;
Engels, R ;
Smirnov, S ;
Atnoor, D ;
Brown, A ;
Allen, N ;
Naylor, J ;
Stange-Thomann, N ;
DeArellano, K ;
Johnson, R ;
Linton, L ;
McEwan, P ;
McKernan, K ;
Talamas, J ;
Tirrell, A ;
Ye, WJ ;
Zimmer, A ;
Barber, RD ;
Cann, I ;
Graham, DE ;
Grahame, DA ;
Guss, AM ;
Hedderich, R ;
Ingram-Smith, C ;
Kuettner, HC ;
Krzycki, JA ;
Leigh, JA ;
Li, WX ;
Liu, JF ;
Mukhopadhyay, B ;
Reeve, JN ;
Smith, K ;
Springer, TA ;
Umayam, LA ;
White, O ;
White, RH ;
de Macario, EC ;
Ferry, JG ;
Jarrell, KF ;
Jing, H ;
Macario, AJL ;
Paulsen, I ;
Pritchett, M ;
Sowers, KR .
GENOME RESEARCH, 2002, 12 (04) :532-542