Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms

被引:45
作者
Block, Peter [1 ]
Paern, Juri [1 ]
Huellermeier, Eyke [1 ]
Sanschagrin, Paul [1 ]
Sotriffer, Christoph A. [1 ]
Klebe, Gerhard [1 ]
机构
[1] Univ Marburg, Dept Pharmaceut Chem, D-35032 Marburg, Germany
关键词
classification; protein interfaces; feature selection; support vector machines; decision trees;
D O I
10.1002/prot.21104
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Analyzing protein-protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein-protein recognition. For this purpose, descriptors explaining the nature of different protein-protein complexes are desirable. In this work, the authors introduced Epic Protein Interface Classification as a framework handling the preparation, processing, and analysis of protein-protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines, C4.5 Decision Trees, K Nearest Neighbors, and Naive Bayes algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms, to extract discriminating features from the protein-protein complexes. To compare protein-protein complexes to each other, the authors represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors, DrugScore pair potential vectors and SFCscore descriptor vectors. We classified two different datasets: (A) 172 protein-protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein-protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein-protein complexes and introduce an approach for scoring the importance of the extracted features.
引用
收藏
页码:607 / 622
页数:16
相关论文
共 41 条
[1]
A dissection of specific and non-specific protein - Protein interfaces [J].
Bahadur, RP ;
Chakrabarti, P ;
Rodier, F ;
Janin, J .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (04) :943-955
[2]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]
BLUM LP, 1997, ARTIF INTELL, V7, P245
[4]
Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[5]
Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[6]
Classifying "kinase inhibitor-likeness" by using machine-learning methods [J].
Briem, H ;
Günther, J .
CHEMBIOCHEM, 2005, 6 (03) :558-566
[7]
Dissecting protein-protein recognition sites [J].
Chakrabarti, P ;
Janin, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 47 (03) :334-343
[8]
LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]
VALIDATION OF THE GENERAL-PURPOSE TRIPOS 5.2 FORCE-FIELD [J].
CLARK, M ;
CRAMER, RD ;
VANOPDENBOSCH, N .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1989, 10 (08) :982-1012
[10]
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411