A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach

被引:40
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Kuang, Qifan [1 ]
Pu, Xuemei [1 ]
Ji, Yue [1 ]
Zhang, Zhihang [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein-ligand binding affinity prediction; Family-specific model; Generic model; Random forest; BINDING-AFFINITY; SCORING FUNCTIONS; HIV-1; PROTEASE; DESCRIPTORS; DOCKING; RECOGNITION; VALIDATION; QSAR; NMR; RNA;
D O I
10.1007/s10822-014-9827-y
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R (p) and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
引用
收藏
页码:349 / 360
页数:12
相关论文
共 45 条
[1]  
[Anonymous], 2002, Principal components analysis
[2]  
[Anonymous], 2019, R: A language for environment for statistical computing
[3]  
[Anonymous], ADV NEURAL INFORM PR
[4]   Ligand binding affinities from MD simulations [J].
Åqvist, J ;
Luzhkov, VB ;
Brandsdal, BO .
ACCOUNTS OF CHEMICAL RESEARCH, 2002, 35 (06) :358-365
[5]   Does a More Precise Chemical Description of Protein-Ligand Complexes Lead to More Accurate Prediction of Binding Affinity? [J].
Ballester, Pedro J. ;
Schreyer, Adrian ;
Blundell, Tom L. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (03) :944-955
[6]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[7]   Biomolecular NMR: a chaperone to drug discovery [J].
Betz, Marco ;
Saxena, Krishna ;
Schwalbe, Harald .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2006, 10 (03) :219-225
[8]  
Breiman L., 1996, OUT OF BAG ESTIMATIO
[9]   SODOCK: Swarm optimization for highly flexible protein-ligand docking [J].
Chen, Hung-Ming ;
Liu, Bo-Fu ;
Huang, Hui-Ling ;
Hwang, Shiow-Fen ;
Ho, Shinn-Ying .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2007, 28 (02) :612-623
[10]   Comparative Assessment of Scoring Functions on a Diverse Test Set [J].
Cheng, Tiejun ;
Li, Xun ;
Li, Yan ;
Liu, Zhihai ;
Wang, Renxiao .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (04) :1079-1093