Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

被引:51
作者
Butkiewicz, Mariusz [1 ]
Lowe, Edward W., Jr. [1 ]
Mueller, Ralf [1 ]
Mendenhall, Jeffrey L. [1 ]
Teixeira, Pedro L. [1 ]
Weaver, C. David [1 ]
Meiler, Jens [1 ]
机构
[1] Vanderbilt Univ, Inst Chem Biol, Struct Biol Ctr, Dept Chem Pharmacol & Biomed Informat, Nashville, TN 37232 USA
基金
美国国家科学基金会;
关键词
virtual screening; machine learning; quantitative structure-activity relations (QSAR); high-throughput screening (HTS); cheminformatics; PubChem; BCL; TYROSYL-DNA PHOSPHODIESTERASE; DISTRIBUTION-FUNCTION DESCRIPTORS; 2D AUTOCORRELATION; NEURAL-NETWORKS; DRUG DISCOVERY; ION CHANNELS; QSAR MODELS; INHIBITION; IDENTIFICATION; CALCIUM;
D O I
10.3390/molecules18010735
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL:: ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.
引用
收藏
页码:735 / 756
页数:22
相关论文
共 83 条
[1]
[Anonymous], 2002, Learning with Kernels
[2]
Novel high-throughput electrochemiluminescent assay for identification of human tyrosyl-DNA phosphodiesterase (Tdp1) inhibitors and characterization of furamidine (NSC 305831) as an inhibitor of Tdp1 [J].
Antony, Smitha ;
Marchand, Christophe ;
Stephen, Andrew G. ;
Thibaut, Laurent ;
Agama, Keli K. ;
Fisher, Robert J. ;
Pommier, Yves .
NUCLEIC ACIDS RESEARCH, 2007, 35 (13) :4474-4484
[3]
Muscarinic Antagonist Control of Myopia: Evidence for M4 and M1 Receptor-Based Pathways in the Inhibition of Experimentally-Induced Axial Myopia in the Tree Shrew [J].
Arumugam, Baskar ;
McBrien, Neville A. .
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2012, 53 (09) :5827-5837
[4]
NIH Molecular Libraries Initiative [J].
Austin, CP ;
Brady, LS ;
Insel, TR ;
Collins, FS .
SCIENCE, 2004, 306 (5699) :1138-1139
[5]
Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[6]
Epothilones: Quantitative structure activity relations studied by support vector machines and artificial neural networks [J].
Bleckmann, A ;
Meiler, J .
QSAR & COMBINATORIAL SCIENCE, 2003, 22 (07) :722-728
[7]
Bodick NC, 1997, ALZ DIS ASSOC DIS, V11, pS16
[8]
The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[9]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32