Does a More Precise Chemical Description of Protein-Ligand Complexes Lead to More Accurate Prediction of Binding Affinity?

被引:146
作者
Ballester, Pedro J. [1 ]
Schreyer, Adrian [2 ]
Blundell, Tom L. [2 ]
机构
[1] European Bioinformat Inst, Hinxton CB10 1SD, England
[2] Univ Cambridge, Dept Biochem, Cambridge CB2 1GA, England
基金
英国医学研究理事会; 英国惠康基金;
关键词
EMPIRICAL SCORING FUNCTION; DOCKING; DISCOVERY; LONG; RECOGNITION; VALIDATION; DATABASE; NNSCORE;
D O I
10.1021/ci500091r
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.
引用
收藏
页码:944 / 955
页数:12
相关论文
共 60 条
[1]   Carbonyl-carbonyl interactions can be competitive with hydrogen bonds [J].
Allen, FH ;
Baalham, CA ;
Lommerse, JPM ;
Raithby, PR .
ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 1998, 54 :320-329
[2]  
[Anonymous], SciPy: Open source scientific tools for Python
[3]   Definition of the hydrogen bond (IUPAC Recommendations 2011) [J].
Arunan, Elangannan ;
Desiraju, Gautam R. ;
Klein, Roger A. ;
Sadlej, Joanna ;
Scheiner, Steve ;
Alkorta, Ibon ;
Clary, David C. ;
Crabtree, Robert H. ;
Dannenberg, Joseph J. ;
Hobza, Pavel ;
Kjaergaard, Henrik G. ;
Legon, Anthony C. ;
Mennucci, Benedetta ;
Nesbitt, David J. .
PURE AND APPLIED CHEMISTRY, 2011, 83 (08) :1637-1641
[4]   Hierarchical virtual screening for the discovery of new molecular scaffolds in antibacterial hit identification [J].
Ballester, Pedro J. ;
Mangold, Martina ;
Howard, Nigel I. ;
Robinson, Richard L. Marchese ;
Abell, Chris ;
Blumberger, Jochen ;
Mitchell, John B. O. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2012, 9 (77) :3196-3207
[5]   Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1739-1741
[6]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[7]   Non-additivity of Functional Group Contributions in Protein Ligand Binding: A Comprehensive Study by Crystallography and Isothermal Titration Calorimetry [J].
Baum, Bernhard ;
Muley, Laveena ;
Smolinski, Michael ;
Heine, Andreas ;
Hangauer, David ;
Klebe, Gerhard .
JOURNAL OF MOLECULAR BIOLOGY, 2010, 397 (04) :1042-1054
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Long-range electrostatic contributions to protein-ligand binding estimated using protein charge ladders, affinity capillary electrophoresis, and continuum electrostatic theory [J].
Caravella, JA ;
Carbeck, JD ;
Duffy, DC ;
Whitesides, GM ;
Tidor, B .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1999, 121 (18) :4340-4347