Evaluation of machine-learning methods for ligand-based virtual screening

被引:100
作者
Chen, Beining
Harrison, Robert F.
Papadatos, George
Willett, Peter
Wood, David J.
Lewell, Xiao Qing
Greenidge, Paulette
Stiefl, Nikolaus
机构
[1] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
[3] GlaxoSmithKline Res & Dev Ltd, Stevenage SG1 2NY, Herts, England
[4] Novartis Pharma AG, CH-4056 Basel, Switzerland
[5] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S10 2TN, S Yorkshire, England
[6] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
[7] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
[8] Univ Sheffield, Dept Chem, Sheffield S3 7HF, S Yorkshire, England
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
group fusion; kernel discrimination; ligand-based virtual screening; machine learning; naive Bayesian classifier; similarity searching; virtual screening;
D O I
10.1007/s10822-006-9096-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 69 条
[61]   Searching for pharmacophoric patterns in databases of three-dimensional chemical structures [J].
Willett, P .
JOURNAL OF MOLECULAR RECOGNITION, 1995, 8 (05) :290-303
[62]   Chemical similarity searching [J].
Willett, P ;
Barnard, JM ;
Downs, GM .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :983-996
[63]  
Willett P., 1987, Similarity and clustering in chemical information systems
[64]   Enhancing the effectiveness of ligand-based virtual screening using data fusion [J].
Willett, Peter .
QSAR & COMBINATORIAL SCIENCE, 2006, 25 (12) :1143-1152
[65]   Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance [J].
Williams, Chris .
MOLECULAR DIVERSITY, 2006, 10 (03) :311-332
[66]   Comparison of ranking methods for virtual screening in lead-discovery programs [J].
Wilton, D ;
Willett, P ;
Lawson, K ;
Mullier, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (02) :469-474
[67]   Virtual screening using binary kernel discrimination: Analysis of pesticide data [J].
Wilton, DJ ;
Harrison, RF ;
Willett, P ;
Delaney, J ;
Lawson, K ;
Mullier, G .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (02) :471-477
[68]   Classification of kinase inhibitors using a Bayesian model [J].
Xia, XY ;
Maliski, EG ;
Gallant, P ;
Rogers, D .
JOURNAL OF MEDICINAL CHEMISTRY, 2004, 47 (18) :4463-4470
[69]   Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: Ranking, voting, and consensus scoring [J].
Zhang, Q ;
Muegge, I .
JOURNAL OF MEDICINAL CHEMISTRY, 2006, 49 (05) :1536-1548