Prediction of protein crystallization using collocation of amino acid pairs

被引：98

作者：

Chen, Ke ^{[1
]}

Kurgan, Lukasz ^{[1
]}

Rahbari, Mandana ^{[1
]}

机构：

[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada

来源：

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS | 2007年 / 355卷 / 03期

基金：

加拿大自然科学与工程研究理事会;

关键词：

protein crystallization; X-ray crystallography; collocated amino acid pairs; classification; CRYSTALP; Naive Bayes;

D O I：

10.1016/j.bbrc.2007.02.040

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

While above 80% of protein structures in PDB were determined using X-ray crystallography, in some cases only 42% of soluble purified proteins yield crystals. Since experimental verification of protein's ability to crystallize is relatively expensive and time-consuming, we propose a new in silico prediction system, called CRYSTALP, which is based on the protein's sequence. CRYSTALP uses a novel feature-based sequence representation and applies a Naive Bayes classifier. It was compared with recent, competing in silico method, SECRET [P. Smialowski, T. Schmidt, J. Cox, A. Kirschner, D. Frishman, Will my protein crystallize'? A sequence-based predictor, Proteins 62 (2) (2006) 343-355], and other state-of-the-art classifiers. Based on experimental tests, CRYSTALP is shown to predict crystallization with 77.5% accuracy, which is better by over 10% than the SECRET's accuracy, and better than accuracy of the other considered classifiers. CRYSTALP uses different and over 50% less features to represent sequences than SECRET. Additionally, features used by CRYSTALP may help to discover intra-molecular markers that influence protein crystallization. (c) 2007 Elsevier Inc. All rights reserved.

引用

页码：764 / 769

页数：6

共 19 条

[1]

AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759

[2]

[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques

[3] The Protein Data Bank [J].

Berman, HM ;

Westbrook, J ;

Feng, Z ;

Gilliland, G ;

Bhat, TN ;

Weissig, H ;

Shindyalov, IN ;

Bourne, PE .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242

[4] Using pseudo-amino acid composition and support vector machine to predict protein structural class [J].

Chen, Chao ;

Tian, Yuan-Xin ;

Zou, Xiao-Yong ;

Cai, Pei-Xiang ;

Mo, Jin-Yuan .

JOURNAL OF THEORETICAL BIOLOGY, 2006, 243 (03) :444-448

[5]

Chen K, 2006, PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, P366

[6]

HALL M, 1999, THESIS U WAKIATO

[7]

John G., 1995, P 11 C UNC ART INT, V1, P338, DOI [10.1109/TGRS.2004.834800, DOI 10.1109/TGRS.2004.834800, 10.5555/2074158.2074196, DOI 10.5555/2074158.2074196]

[8] Classifier ensembles for protein structural class prediction with varying homology [J].

Kedarisetti, Kanaka Durga ;

Kurgan, Lukasz ;

Dick, Scott .

BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2006, 348 (03) :981-988

[9] Improvements to Platt's SMO algorithm for SVM classifier design [J].

Keerthi, SS ;

Shevade, SK ;

Bhattacharyya, C ;

Murthy, KRK .

NEURAL COMPUTATION, 2001, 13 (03) :637-649

[10] Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy [J].

Kurgan, Lukasz A. ;

Homaeian, Leila .

PATTERN RECOGNITION, 2006, 39 (12) :2323-2343

← 1 2 →