Prediction of protein continuum secondary structure with probabilistic models based on NMR solved structures

被引:21
作者
Bodén, M
Yuan, Z
Bailey, TL
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, St Lucia, Qld 4072, Australia
[2] Univ Queensland, Inst Mol Biosci, St Lucia, Qld 4072, Australia
关键词
28;
D O I
10.1186/1471-2105-7-68
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The structure of proteins may change as a result of the inherent flexibility of some protein regions. We develop and explore probabilistic machine learning methods for predicting a continuum secondary structure, i.e. assigning probabilities to the conformational states of a residue. We train our methods using data derived from high-quality NMR models. Results: Several probabilistic models not only successfully estimate the continuum secondary structure, but also provide a categorical output on par with models directly trained on categorical data. Importantly, models trained on the continuum secondary structure are also better than their categorical counterparts at identifying the conformational state for structurally ambivalent residues. Conclusion: Cascaded probabilistic neural networks trained on the continuum secondary structure exhibit better accuracy in structurally ambivalent regions of proteins, while sustaining an overall classification accuracy on par with standard, categorical prediction methods.
引用
收藏
页数:12
相关论文
共 28 条
[1]   Continuum secondary structure captures protein flexibility [J].
Anderson, CAF ;
Palmer, AG ;
Brunak, S ;
Rost, B .
STRUCTURE, 2002, 10 (02) :175-184
[2]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   DSSPcont: continuous secondary structure assignments for proteins [J].
Carter, P ;
Andersen, CAF ;
Rost, B .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3293-3295
[5]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[6]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[7]   CAFASP3 in the spotlight of EVA [J].
Eyrich, VA ;
Przybylski, D ;
Koh, IYY ;
Grana, O ;
Pazos, F ;
Valencia, A ;
Rost, B .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :548-560
[8]   Knowledge-based protein secondary structure assignment [J].
Frishman, D ;
Argos, P .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 23 (04) :566-579
[9]   Combining protein secondary structure prediction models with ensemble methods of optimal complexity [J].
Guermeur, Y ;
Pollastri, G ;
Elisseeff, A ;
Zelus, D ;
Paugam-Moisy, H ;
Baldi, P .
NEUROCOMPUTING, 2004, 56 :305-327
[10]  
HOBOHM U, 1992, PROTEIN SCI, V1, P409