Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins

被引:37
作者
Ashtawy, Hossam M. [1 ]
Mahapatra, Nihar R. [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
BINDING-AFFINITY; FLEXIBLE DOCKING; VALIDATION; RECOGNITION;
D O I
10.1186/1471-2105-16-S6-S3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein's binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark dataset on both diverse and homogeneous (protein-family-specific) test sets. Further, we perform a systematic analysis of the performance of the proposed SFs in identifying native poses of ligands that are docked to novel protein targets. Results and conclusion: We find that the best performing ML SF has a success rate of 80% in identifying poses that are within 1 angstrom root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70% achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. In addition, the proposed ML SFs perform better on novel proteins that they were never trained on before. We also observed steady gains in the performance of these scoring functions as the training set size and number of features were increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex.
引用
收藏
页数:17
相关论文
共 32 条
[11]   Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1739-1741
[12]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[13]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[14]  
Breiman L., 2001, J. Clin. Microbiol, V45, P5
[15]   Comparative Assessment of Scoring Functions on a Diverse Test Set [J].
Cheng, Tiejun ;
Li, Xun ;
Li, Yan ;
Liu, Zhihai ;
Wang, Renxiao .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (04) :1079-1093
[16]   DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases [J].
Ewing, TJA ;
Makino, S ;
Skillman, AG ;
Kuntz, ID .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2001, 15 (05) :411-428
[17]   MOLECULAR RECOGNITION OF THE INHIBITOR AG-1343 BY HIV-1 PROTEASE - CONFORMATIONALLY FLEXIBLE DOCKING BY EVOLUTIONARY PROGRAMMING [J].
GEHLHAAR, DK ;
VERKHIVKER, GM ;
REJTO, PA ;
SHERMAN, CJ ;
FOGEL, DB ;
FOGEL, LJ ;
FREER, ST .
CHEMISTRY & BIOLOGY, 1995, 2 (05) :317-324
[18]   Knowledge-based scoring function to predict protein-ligand interactions [J].
Gohlke, H ;
Hendlich, M ;
Klebe, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 295 (02) :337-356
[19]  
Hastie T., 2009, The elements of statistical learning: data mining, inference, and pre- diction, V2nd ed
[20]   Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search [J].
Jain, Ajay N. .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (05) :281-306