A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction

被引:35
作者
Ashtawy, Hossam M. [1 ]
Mahapatra, Nihar R. [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
基金
美国国家科学基金会;
关键词
Drug discovery; machine learning; protein-ligand binding affinity; ranking power; scoring function; virtual screening; DOCKING; RECOGNITION; VALIDATION; DATABASE;
D O I
10.1109/TCBB.2012.36
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Accurately predicting the binding affinities of large sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited ranking accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the ranking accuracies of these new ML-based SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark data sets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training data set and the type and number of features used on ranking accuracy. Within clusters of protein-ligand complexes with different ligands bound to the same target protein, we find that the best ML-based SF is able to rank the ligands correctly based on their experimentally determined binding affinities 62.5 percent of the time and identify the top binding ligand 78.1 percent of the time. For this SF, the Spearman correlation coefficient between ranks of ligands ordered by predicted and experimentally determined binding affinities is 0.771. Given the challenging nature of the ranking problem and that SFs are used to screen millions of ligands, this represents a significant improvement over the best conventional SF we studied, for which the corresponding ranking performance values are 57.8 percent, 73.4 percent, and 0.677.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 41 条
[1]  
A.S. Inc., 2001, DISCOVERY STUDIO SOF
[2]   The Cambridge Structural Database: a quarter of a million crystal structures and rising [J].
Allen, FH .
ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 2002, 58 (3 PART 1) :380-388
[3]  
[Anonymous], 2005, SCHRODINGER SOFTWARE
[4]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Comparative Assessment of Scoring Functions on a Diverse Test Set [J].
Cheng, Tiejun ;
Li, Xun ;
Li, Yan ;
Liu, Zhihai ;
Wang, Renxiao .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (04) :1079-1093
[9]  
Dimitriadou E., 2010, e1071: Misc Functions of the Department of Statistics (e1071)
[10]   Empirical scoring functions .1. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes [J].
Eldridge, MD ;
Murray, CW ;
Auton, TR ;
Paolini, GV ;
Mee, RP .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1997, 11 (05) :425-445