Prediction of protein crystallization using collocation of amino acid pairs

被引:98
作者
Chen, Ke [1 ]
Kurgan, Lukasz [1 ]
Rahbari, Mandana [1 ]
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
protein crystallization; X-ray crystallography; collocated amino acid pairs; classification; CRYSTALP; Naive Bayes;
D O I
10.1016/j.bbrc.2007.02.040
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
While above 80% of protein structures in PDB were determined using X-ray crystallography, in some cases only 42% of soluble purified proteins yield crystals. Since experimental verification of protein's ability to crystallize is relatively expensive and time-consuming, we propose a new in silico prediction system, called CRYSTALP, which is based on the protein's sequence. CRYSTALP uses a novel feature-based sequence representation and applies a Naive Bayes classifier. It was compared with recent, competing in silico method, SECRET [P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize'? A sequence-based predictor, Proteins 62 (2) (2006) 343-355], and other state-of-the-art classifiers. Based on experimental tests, CRYSTALP is shown to predict crystallization with 77.5% accuracy, which is better by over 10% than the SECRET's accuracy, and better than accuracy of the other considered classifiers. CRYSTALP uses different and over 50% less features to represent sequences than SECRET. Additionally, features used by CRYSTALP may help to discover intra-molecular markers that influence protein crystallization. (c) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:764 / 769
页数:6
相关论文
共 19 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]   Using pseudo-amino acid composition and support vector machine to predict protein structural class [J].
Chen, Chao ;
Tian, Yuan-Xin ;
Zou, Xiao-Yong ;
Cai, Pei-Xiang ;
Mo, Jin-Yuan .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 243 (03) :444-448
[5]  
Chen K, 2006, PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P366
[6]  
HALL M, 1999, THESIS U WAKIATO
[7]  
John G., 1995, P 11 C UNC ART INT, V1, P338, DOI [10.1109/TGRS.2004.834800, DOI 10.1109/TGRS.2004.834800, 10.5555/2074158.2074196, DOI 10.5555/2074158.2074196]
[8]   Classifier ensembles for protein structural class prediction with varying homology [J].
Kedarisetti, Kanaka Durga ;
Kurgan, Lukasz ;
Dick, Scott .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2006, 348 (03) :981-988
[9]   Improvements to Platt's SMO algorithm for SVM classifier design [J].
Keerthi, SS ;
Shevade, SK ;
Bhattacharyya, C ;
Murthy, KRK .
NEURAL COMPUTATION, 2001, 13 (03) :637-649
[10]   Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy [J].
Kurgan, Lukasz A. ;
Homaeian, Leila .
PATTERN RECOGNITION, 2006, 39 (12) :2323-2343