Screening for dihydrofolate reductase inhibitors using MOLPRINT 2D, a fast fragment-based method employing the naive Bayesian classifier: Limitations of the descriptor and the importance of balanced chemistry in training and test sets

被引:25
作者
Bender, A [1 ]
Mussa, HY [1 ]
Glen, RC [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
关键词
virtual screening; molecular similarity; MOLPRINT; DHFR; dihydrofolate reductase;
D O I
10.1177/1087057105281048
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A fragment-based similarity searching method, MOLPRINT 2D, was employed for virtual screening of Escherichia coli dihydrofolate reductase inhibitors. Using the original training set of 50,000 compounds, only marginal enrichment factors (between 1 and 3) could be achieved on the test library. The active structures contained in the training and test libraries represented different types of "chemistry," that is, different substructural features associated with activity. Training and test sets were pooled in a 2nd step and randomly split into training and test of equal size, with the objective of smoothing out the different chemical characteristics of both libraries. In a 10-fold cross-validation study on the new training and test sets, typically 10-fold enrichment could be found in the first 96 positions, 4-fold enrichment in the first 384 positions, and 3-fold enrichment in the first 1536 positions, corresponding to 6, 10, and 28 hits, respectively (out of a total of 307; activity defined as average residual activity of less than 80%). The conclusions are 2-fold. On one hand, the exact fragment-matching similarity searching method employed here is not capable of finding completely novel hit structures. On the other hand, this study emphasizes the requirement for a comparable distribution of chemical features of the training and test sets. MOLPRINT 2D is freely downloadable from http://www.cheminformatics.orc.
引用
收藏
页码:658 / 666
页数:9
相关论文
共 18 条
[1]   STRATEGIC CONSIDERATIONS IN DESIGN OF A SCREENING SYSTEM FOR SUBSTRUCTURE SEARCHES OF CHEMICAL STRUCTURE FILES [J].
ADAMSON, GW ;
COWELL, J ;
MCLURE, AHW ;
TOWN, WG ;
YAPP, AM ;
LYNCH, MF .
JOURNAL OF CHEMICAL DOCUMENTATION, 1973, 13 (03) :153-157
[2]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[3]   Molecular similarity: a key technique in molecular informatics [J].
Bender, A ;
Glen, RC .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) :3204-3218
[4]   Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :170-178
[5]   VALIDATION OF THE GENERAL-PURPOSE TRIPOS 5.2 FORCE-FIELD [J].
CLARK, M ;
CRAMER, RD ;
VANOPDENBOSCH, N .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1989, 10 (08) :982-1012
[6]   COMPARATIVE MOLECULAR-FIELD ANALYSIS (COMFA) .1. EFFECT OF SHAPE ON BINDING OF STEROIDS TO CARRIER PROTEINS [J].
CRAMER, RD ;
PATTERSON, DE ;
BUNCE, JD .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1988, 110 (18) :5959-5967
[7]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[8]   SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA [J].
DOWNS, GM ;
WILLETT, P ;
FISANICK, W .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (05) :1094-1102
[9]   Recent advances on the role of topological indices in drug discovery research [J].
Estrada, E ;
Uriarte, E .
CURRENT MEDICINAL CHEMISTRY, 2001, 8 (13) :1573-1588
[10]   Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures [J].
Hert, J ;
Willett, P ;
Wilton, DJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (03) :1177-1185