Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered

被引:10
作者
Bulashevska, Alla [1 ]
Eils, Roland [1 ,2 ]
机构
[1] German Canc Res Ctr, Dept Theoret Bioinformat, D-69120 Heidelberg, Germany
[2] Heidelberg Univ, IPMB, Dept Bioinformat & Funct Genom, D-6900 Heidelberg, Germany
关键词
Unfolded proteins; Disorder prediction; Model-based classification; Multinomial model;
D O I
10.1016/j.jtbi.2008.05.040
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Intrinsically disordered proteins (IDPs) lack a well-defined three-dimensional structure under physiological conditions. Intrinsic disorder is a common phenomenon, particularly in multicellular eukaryotes, and is responsible for important protein functions including regulation and signaling. Many disease-related proteins are likely to be intrinsically disordered or to have disordered regions. In this paper, a new predictor model based on the Bayesian classification methodology is introduced to predict for a given protein or protein region if it is intrinsically disordered or ordered using only its primary sequence. The method allows to incorporate length-dependent amino acid compositional differences of disordered regions by including separate statistical representations for short, middle and long disordered regions. The predictor was trained on the constructed data set of protein regions with known structural properties. In a Jack-knife test, the predictor achieved the sensitivity of 89.2% for disordered and 81.4% for ordered regions. Our method outperformed several reported predictors when evaluated on the previously published data set of Prilusky et al. [2005. Foldlndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics 21 (16), 3435-3438]. Further strength of our approach is the ease of implementation. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:799 / 803
页数:5
相关论文
共 36 条
[1]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[2]  
[Anonymous], 1979, Multivariate analysis
[3]   Huntingtin aggregation and toxicity in Huntington's disease [J].
Bates, G .
LANCET, 2003, 361 (9369) :1642-1644
[4]   Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains [J].
Bulashevska, Alla ;
Eils, Roland .
BMC BIOINFORMATICS, 2006, 7 (1)
[5]   Most nuclear systemic autoantigens are extremely disordered proteins: implications for the etiology of systemic autoimmunity [J].
Carl, PL ;
Temple, BRS ;
Cohen, PL .
ARTHRITIS RESEARCH & THERAPY, 2005, 7 (06) :R1360-R1374
[6]   Abundance of intrinsic disorder in protein associated with cardiovascular disease [J].
Cheng, Yugong ;
LeGall, Tanguy ;
Oldfield, Christopher J. ;
Dunker, A. Keith ;
Uversky, Vladimir N. .
BIOCHEMISTRY, 2006, 45 (35) :10448-10460
[7]   IUPred:: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content [J].
Dosztányi, Z ;
Csizmok, V ;
Tompa, P ;
Simon, I .
BIOINFORMATICS, 2005, 21 (16) :3433-3434
[8]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[9]   Intrinsic disorder and protein function [J].
Dunker, AK ;
Brown, CJ ;
Lawson, JD ;
Iakoucheva, LM ;
Obradovic, Z .
BIOCHEMISTRY, 2002, 41 (21) :6573-6582
[10]  
Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids