Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences

被引:45
作者
Chen, YC
Lin, SC
Lin, CJ
Hwang, JK [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Biol Sci & Technol, Hsinchu 30050, Taiwan
[2] Natl Chiao Tung Univ, Inst Bioinformat, Hsinchu 30050, Taiwan
[3] Natl Taiwan Univ, Dept Comp Sci, Taipei 10764, Taiwan
关键词
support vector machines; disulfide bonds; cysteine state sequences; multiple feature vectors;
D O I
10.1002/prot.20079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The support vector machine (SVM) method is used to predict the bonding states of cysteines. Besides using local descriptors such as the local sequences, we include global information, such as amino acid compositions and the patterns of the states of cysteines (bonded or nonbonded), or cysteine state sequences, of the proteins. We found that SVM based on local sequences or global amino acid compositions yielded similar prediction accuracies for the data set comprising 4136 cysteine-containing segments extracted from 969 nonhomologous proteins. However, the SVM method based on multiple feature vectors (combining local sequences and global amino acid compositions) significantly improves the prediction accuracy, from 80% to 86%. If coupled with cysteine state sequences, SVM based on multiple feature vectors yields 90% in overall prediction accuracy and a 0.77 Matthews correlation coefficient, around 10% and 22% higher than the corresponding values obtained by SVM based on local sequence information. (C) 2004Wiley-Liss, Inc.
引用
收藏
页码:1036 / 1042
页数:7
相关论文
共 39 条
[1]   What can disulfide bonds tell us about protein energetics, function and folding: Simulations and bioninformatics analysis [J].
Abkevich, VI ;
Shakhnovich, EI .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :975-985
[2]   Redox regulation of the DNA binding activity in transcription factor PEBP2 - The roles of two conserved cysteine residues [J].
Akamatsu, Y ;
Ohno, T ;
Hirota, K ;
Kagoshima, H ;
Yodoi, J ;
Shigesada, K .
JOURNAL OF BIOLOGICAL CHEMISTRY, 1997, 272 (23) :14497-14500
[3]  
Anfinsen C B, 1975, Adv Protein Chem, V29, P205, DOI 10.1016/S0065-3233(08)60413-1
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[6]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[7]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[8]   Relationship between protein structures and disulfide bonding patterns [J].
Chuang, CC ;
Chen, CY ;
Yang, JM ;
Lyu, PC ;
Hwang, JK .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 (01) :1-5
[9]   ENGINEERED DISULFIDE BONDS AS PROBES OF THE FOLDING PATHWAY OF BARNASE - INCREASING THE STABILITY OF PROTEINS AGAINST THE RATE OF DENATURATION [J].
CLARKE, J ;
FERSHT, AR .
BIOCHEMISTRY, 1993, 32 (16) :4322-4329
[10]   The effects of disulfide bonds on the denatured state of barnase [J].
Clarke, J ;
Hounslow, AM ;
Bond, CJ ;
Fersht, AR ;
Daggett, V .
PROTEIN SCIENCE, 2000, 9 (12) :2394-2404