Protein disorder prediction by condensed PSSM considering propensity for order or disorder

被引:83
作者
Su, Chung-Tsai [1 ]
Chen, Chien-Yu
Ou, Yu-Yen
机构
[1] Natl Taiwan Univ, Dept Bioind Mechatron Engn, Taipei 106, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
[3] Yuan Ze Univ, Grad Sch Biotechnol & Bioinformat, Chungli 320, Taiwan
[4] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli 320, Taiwan
关键词
D O I
10.1186/1471-2105-7-319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: More and more disordered regions have been discovered in protein sequences, and many of them are found to be functionally significant. Previous studies reveal that disordered regions of a protein can be predicted by its primary structure, the amino acid sequence. One observation that has been widely accepted is that ordered regions usually have compositional bias toward hydrophobic amino acids, and disordered regions are toward charged amino acids. Recent studies further show that employing evolutionary information such as position specific scoring matrices (PSSMs) improves the prediction accuracy of protein disorder. As more and more machine learning techniques have been introduced to protein disorder detection, extracting more useful features with biological insights attracts more attention. Results: This paper first studies the effect of a condensed position specific scoring matrix with respect to physicochemical properties (PSSMP) on the prediction accuracy, where the PSSMP is derived by merging several amino acid columns of a PSSM belonging to a certain property into a single column. Next, we decompose each conventional physicochemical property of amino acids into two disjoint groups which have a propensity for order and disorder respectively, and show by experiments that some of the new properties perform better than their parent properties in predicting protein disorder. In order to get an effective and compact feature set on this problem, we propose a hybrid feature selection method that inherits the efficiency of uni-variant analysis and the effectiveness of the stepwise feature selection that explores combinations of multiple features. The experimental results show that the selected feature set improves the performance of a classifier built with Radial Basis Function Networks (RBFN) in comparison with the feature set constructed with PSSMs or PSSMPs that adopt simply the conventional physicochemical properties. Conclusion: Distinguishing disordered regions from ordered regions in protein sequences facilitates the exploration of protein structures and functions. Results based on independent testing data reveal that the proposed predicting model DisPSSMP performs the best among several of the existing packages doing similar tasks, without either under-predicting or over-predicting the disordered regions. Furthermore, the selected properties are demonstrated to be useful in finding discriminating patterns for order/disorder classification.
引用
收藏
页数:16
相关论文
共 49 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]  
BOZ O, 2002, ICMLA INT C MACH LEA, pP147
[4]   Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences [J].
Chen, YC ;
Lin, SC ;
Lin, CJ ;
Hwang, JK .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (04) :1036-1042
[5]   Accurate prediction of protein disordered regions by mining protein structure data [J].
Cheng, JL ;
Sweredoski, MJ ;
Baldi, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) :213-222
[6]   Prediction of unfolded segments in a protein sequence based on amino acid composition [J].
Coeytaux, K ;
Poupon, A .
BIOINFORMATICS, 2005, 21 (09) :1891-1900
[7]   IUPred:: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content [J].
Dosztányi, Z ;
Csizmok, V ;
Tompa, P ;
Simon, I .
BIOINFORMATICS, 2005, 21 (16) :3433-3434
[8]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[9]  
Dunker A.K., 1998, PACIFIC S BIOCOMPUTI, P473
[10]  
Dunker AK, 2002, ADV PROTEIN CHEM, V62, P25