Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment

被引:62
作者
Ashtawy, Hossam M. [1 ]
Mahapatra, Nihar R. [1 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
基金
美国国家科学基金会;
关键词
PROTEIN; VALIDATION; DOCKING; TOOLS;
D O I
10.1021/acs.jcim.7b00309
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Molecular docking, scoring, and virtual screening play an increasingly important role in computer-aided drug discovery. Scoring functions (SFs) are typically, employed to predict the binding conformation (docking task), binding affinity (scoring task), and binary activity level (screening task) of ligands against a critical protein target in a disease's pathway. In most molecular docking software packages available today, a generic binding affinity-based (BA-based) SF is invoiced for all three tasks to solve three different, but related; prediction problems. The limited predictive accuracies of such SFs in these three tasks has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we develop BT-Score, an ensemble machine-learning (ML) SF of boosted decision trees and thousands Of predictive descriptors to estimate BA. BT-Score reproduced BA of out-of-sample test complexes with correlation of 0.825. Even with this high accuracy in the scoring task, we demonstrate that the docking and screening performance of BT-Score and other BA-based SFs is far from ideal. This has motivated us to build two task-specific ML SFs for the docking and screening problems. We propose BT-Dock, a. boosted-tree ensemble model trained on a large-number of native and computer-generated ligand conformations and optimized to predict binding poses explicitly. This model has shown an average improvement of 25% over its BA-based counterparts in different ligand pose prediction scenarios. Similar improvement has also been obtained by our screening-based SF, BT-Screen, which directly models the ligand activity labeling task as a classification problem. BT-Screen is trained-on thousands of active and inactive protein-ligand complexes to optimize it for finding real actives from databases of ligands not seen in its training set. In addition to the three task-specific SFs, we propose a novel multi-task deep neural network (MT-Net) that is trained on data from the three tasks to simultaneously predict binding poses, affinities, and activity levels. We show that the performance of MT-Net is superior to conventional SFs and on a par with or better than models based on single task neural networks.
引用
收藏
页码:119 / 133
页数:15
相关论文
共 44 条
[1]  
Abadi M., 2016, TENSORFLOW LARGESCAL
[2]  
[Anonymous], 2006, MOE
[3]  
[Anonymous], 2001, DISC STUD SOFTW VERS
[4]  
[Anonymous], 2016, P 22 ACM SIGKDD INT
[5]   Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins [J].
Ashtawy, Hossam M. ;
Mahapatra, Nihar R. .
BMC BIOINFORMATICS, 2015, 16
[6]   A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction [J].
Ashtawy, Hossam M. ;
Mahapatra, Nihar R. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (02) :335-347
[7]   BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes [J].
Ashtawy, Hossam M. ;
Mahapatra, Nihar R. .
BMC BIOINFORMATICS, 2015, 16
[8]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[9]   A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking [J].
Ballester, Pedro J. ;
Mitchell, John B. O. .
BIOINFORMATICS, 2010, 26 (09) :1169-1175
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32