Virtual screening using binary kernel discrimination: Effect of noisy training data and the optimization of performance

被引:28
作者
Chen, BN
Harrison, RF
Pasupa, K
Willett, P [1 ]
Wilton, DJ
Wood, DJ
Lewell, XQ
机构
[1] Univ Sheffield, Dept Chem, Sheffield S10 2TN, S Yorkshire, England
[2] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S10 2TN, S Yorkshire, England
[3] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
[4] GalxoSmithKline Res & Dev, Stevenage SG1 2NY, Herts, England
关键词
D O I
10.1021/ci0505426
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Binary kernel discrimination (BKD) uses a training set of compounds, for which structural and qualitative activity data are available, to produce a model that can then be applied to the structures of other compounds in order to predict their likely activity. Experiments with the MDL Drug Data Report database show that the optimal value of the smoothing parameter, and hence the predictive power of BKD, is crucially dependent on the number of false positives in the training set. It is also shown that the best results for BKD are achieved using one particular optimization method for the determination of the smoothing parameter that lies at the heart of the method and using the Jaccard/Tanimoto coefficient in the kernel function that is used to compute the similarity between a test set molecule and the members of the training set.
引用
收藏
页码:478 / 486
页数:9
相关论文
共 24 条
[1]   MULTIVARIATE BINARY DISCRIMINATION BY KERNEL METHOD [J].
AITCHISON, J ;
AITKEN, CGG .
BIOMETRIKA, 1976, 63 (03) :413-420
[2]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[3]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584
[4]   Ligand-based virtual screening using binary kernel discrimination [J].
Chen, BN ;
Harrison, RF ;
Hert, J ;
Mpanhanga, C ;
Willett, P ;
Wilton, DJ .
MOLECULAR SIMULATION, 2005, 31 (08) :597-604
[5]   SLASH: A program for analysing the functional groups in molecules [J].
Cosgrove, DA ;
Willett, P .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 1998, 16 (01) :19-32
[6]   SUBSTRUCTURAL ANALYSIS - NOVEL APPROACH TO PROBLEM OF DRUG DESIGN [J].
CRAMER, RD ;
REDL, G ;
BERKOFF, CE .
JOURNAL OF MEDICINAL CHEMISTRY, 1974, 17 (05) :533-535
[7]   Deriving knowledge through data mining high-throughput screening data [J].
Diller, DJ ;
Hobbs, DW .
JOURNAL OF MEDICINAL CHEMISTRY, 2004, 47 (25) :6373-6383
[8]  
Ellis D, 1993, PERSPECTIVES INFORMA, V3, P128, DOI DOI 10.1007/978-1-4471-2099-5_6
[9]   Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers [J].
Glick, M ;
Jenkins, JL ;
Nettles, JH ;
Hitchings, H ;
Davies, JW .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :193-200
[10]   Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier [J].
Glick, M ;
Klon, AE ;
Acklin, P ;
Davies, JW .
JOURNAL OF BIOMOLECULAR SCREENING, 2004, 9 (01) :32-36