A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach

被引:40
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Kuang, Qifan [1 ]
Pu, Xuemei [1 ]
Ji, Yue [1 ]
Zhang, Zhihang [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-ligand binding affinity prediction; Family-specific model; Generic model; Random forest; BINDING-AFFINITY; SCORING FUNCTIONS; HIV-1; PROTEASE; DESCRIPTORS; DOCKING; RECOGNITION; VALIDATION; QSAR; NMR; RNA;
D O I
10.1007/s10822-014-9827-y
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R (p) and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
引用
收藏
页码:349 / 360
页数:12
相关论文
共 45 条
[11]   Docking and scoring - Theoretically easy, practically impossible? [J].
Coupez, B. ;
Lewis, R. A. .
CURRENT MEDICINAL CHEMISTRY, 2006, 13 (25) :2995-3003
[12]   Predicting protein-ligand binding affinities using novel geometrical descriptors and machine-learning methods [J].
Deng, W ;
Breneman, C ;
Embrechts, MJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (02) :699-703
[13]   Applications of NMR in drug discovery [J].
Diercks, T ;
Coles, M ;
Kessler, H .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2001, 5 (03) :285-291
[14]   MOLECULAR RECOGNITION OF THE INHIBITOR AG-1343 BY HIV-1 PROTEASE - CONFORMATIONALLY FLEXIBLE DOCKING BY EVOLUTIONARY PROGRAMMING [J].
GEHLHAAR, DK ;
VERKHIVKER, GM ;
REJTO, PA ;
SHERMAN, CJ ;
FOGEL, DB ;
FOGEL, LJ ;
FREER, ST .
CHEMISTRY & BIOLOGY, 1995, 2 (05) :317-324
[15]   Knowledge-based scoring function to predict protein-ligand interactions [J].
Gohlke, H ;
Hendlich, M ;
Klebe, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 295 (02) :337-356
[16]  
Hastie T., 2003, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
[17]   Species-specific recognition of single-stranded RNA via toll-like receptor 7 and 8 [J].
Heil, F ;
Hemmi, H ;
Hochrein, H ;
Ampenberger, F ;
Kirschning, C ;
Akira, S ;
Lipford, G ;
Wagner, H ;
Bauer, S .
SCIENCE, 2004, 303 (5663) :1526-1529
[18]   Three-dimensional distribution function theory for the prediction of protein-ligand binding sites and affinities: Application to the binding of noble gases to hen egg-white lysozyme in aqueous solution [J].
Imai, Takashi ;
Hiraoka, Ryusuke ;
Seto, Tomoyoshi ;
Kovalenko, Andriy ;
Hirata, Fumio .
JOURNAL OF PHYSICAL CHEMISTRY B, 2007, 111 (39) :11585-11591
[19]   Scoring functions for protein-ligand docking [J].
Jain, Ajay N. .
CURRENT PROTEIN & PEPTIDE SCIENCE, 2006, 7 (05) :407-420
[20]   Development and validation of a genetic algorithm for flexible docking [J].
Jones, G ;
Willett, P ;
Glen, RC ;
Leach, AR ;
Taylor, R .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 267 (03) :727-748