Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure

被引:59
作者
Song, Jiangning
Yuan, Zheng
Tan, Hao
Huber, Thomas
Burrage, Kevin [1 ]
机构
[1] Univ Queensland, Adv Computat Modelling Ctr, Brisbane, Qld 4072, Australia
[2] Univ Queensland, ARC Ctr Bioinformat, Inst Mol Biosci, Brisbane, Qld 4072, Australia
[3] Monash Univ, Caulfield Sch Informat Technol, Clayton, Vic 3145, Australia
[4] Univ Queensland, Sch Mol & Microbial Sci, Australian Inst Bioengn, Brisbane, Qld 4072, Australia
基金
澳大利亚研究理事会;
关键词
D O I
10.1093/bioinformatics/btm505
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. Results: We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4 and 77.9, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects.
引用
收藏
页码:3147 / 3154
页数:8
相关论文
共 52 条
[1]   What can disulfide bonds tell us about protein energetics, function and folding: Simulations and bioninformatics analysis [J].
Abkevich, VI ;
Shakhnovich, EI .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :975-985
[2]  
[Anonymous], 2004, Adv. Neural Inf. Process Syst
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J].
Capriotti, E. ;
Calabrese, R. ;
Casadio, R. .
BIOINFORMATICS, 2006, 22 (22) :2729-2734
[8]   Predicting protein stability changes from sequences using support vector machines [J].
Capriotti, E ;
Fariselli, P ;
Calabrese, R ;
Casadio, R .
BIOINFORMATICS, 2005, 21 :54-58
[9]   DISULFIND: a disulfide bonding state and cysteine connectivity prediction server [J].
Ceroni, Alessio ;
Passerini, Andrea ;
Vullo, Alessandro ;
Frasconi, Paolo .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W177-W181
[10]   Structural classification of small, disulfide-rich protein domains [J].
Cheek, Sara ;
Krishna, S. Sri ;
Grishin, Nick V. .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 359 (01) :215-237