A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction

被引：33

作者：

Ashtawy, Hossam M. ^{[1
]}

Mahapatra, Nihar R. ^{[1
]}

机构：

[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48823 USA

来源：

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS | 2015年 / 12卷 / 02期

基金：

美国国家科学基金会;

关键词：

Drug discovery; machine learning; protein-ligand binding affinity; scoring function; scoring power; virtual screening; DOCKING; VALIDATION; DATABASE; COMPLEXES;

D O I：

10.1109/TCBB.2014.2351824

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Accurately predicting the binding affinities of large diverse sets of protein-ligand complexes efficiently is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify potential drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive accuracy has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with a variety of physicochemical and geometrical features characterizing protein-ligand complexes. We assess the scoring accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 and 2010 PDBbind benchmark datasets on both diverse and protein-family-specific test sets. We also investigate the influence of the size of the training dataset and the type and number of features used on scoring accuracy. We find that the best performing ML SF has a Pearson correlation coefficient of 0.806 between predicted and measured binding affinities compared to 0.644 achieved by a state-of-the-art conventional SF. We also find that ML SFs benefit more than their conventional counterparts from increases in the number of features and the size of training dataset. In addition, they perform better on novel proteins that they were never trained on before.

引用

页码：335 / 347

页数：13

共 40 条

[1]

A. S. Inc, 2001, DISC STUD SOFTW

[2] The Cambridge Structural Database: a quarter of a million crystal structures and rising [J].

Allen, FH .

ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE, 2002, 58 (3 PART 1) :380-388

[3] A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming [J].

Amini, Ata ;

Shrimpton, Paul J. ;

Muggleton, Stephen H. ;

Sternberg, Michael J. E. .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 69 (04) :823-831

[4]

[Anonymous], 2005, SCHRODINGER SOFTWARE

[5]

Ashtawy H. M., 2012, P 10 AS PAC BIOINF C, P241

[6] A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction [J].

Ashtawy, Hossam M. ;

Mahapatra, Nihar R. .

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (05) :1301-1313

[7] A Comparative Assessment of Conventional and Machine-Learning-Based Scoring Functions in Predicting Binding Affinities of Protein-Ligand Complexes [J].

Ashtawy, Hossam M. ;

Mahapatra, Nihar R. .

2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, :627-630

[8] Comments on "Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets": Significance for the Validation of Scoring Functions [J].

Ballester, Pedro J. ;

Mitchell, John B. O. .

JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2011, 51 (08) :1739-1741

[9] A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].

Ballester, Pedro J. ;

Mitchell, John B. O. .

BIOINFORMATICS, 2010, 26 (09) :1169-1175

[10] The Protein Data Bank [J].

Berman, HM ;

Westbrook, J ;

Feng, Z ;

Gilliland, G ;

Bhat, TN ;

Weissig, H ;

Shindyalov, IN ;

Bourne, PE .

NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242

← 1 2 3 4 →