Evaluation of machine-learning methods for ligand-based virtual screening

被引:100
作者
Chen, Beining
Harrison, Robert F.
Papadatos, George
Willett, Peter
Wood, David J.
Lewell, Xiao Qing
Greenidge, Paulette
Stiefl, Nikolaus
机构
[1] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Sheffield, Dept Informat Studies, Sheffield S1 4DP, S Yorkshire, England
[3] GlaxoSmithKline Res & Dev Ltd, Stevenage SG1 2NY, Herts, England
[4] Novartis Pharma AG, CH-4056 Basel, Switzerland
[5] Univ Sheffield, Krebs Inst Biomolec Res, Sheffield S10 2TN, S Yorkshire, England
[6] Univ Sheffield, Dept Informat Studies, Sheffield S10 2TN, S Yorkshire, England
[7] Univ Sheffield, Dept Automat Control & Syst Engn, Sheffield S1 3JD, S Yorkshire, England
[8] Univ Sheffield, Dept Chem, Sheffield S3 7HF, S Yorkshire, England
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
group fusion; kernel discrimination; ligand-based virtual screening; machine learning; naive Bayesian classifier; similarity searching; virtual screening;
D O I
10.1007/s10822-006-9096-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 69 条
[1]  
AITCHISON J, 1976, BIOMETRIKA, V63, P413, DOI 10.2307/2335719
[2]  
[Anonymous], 1997, Machine Learning
[3]   Discriminating between drugs and nondrugs by prediction of activity spectra for substances (PASS) [J].
Anzali, S ;
Barnickel, G ;
Cezanne, B ;
Krug, M ;
Filimonov, D ;
Poroikov, V .
JOURNAL OF MEDICINAL CHEMISTRY, 2001, 44 (15) :2432-2437
[4]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[5]   Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (05) :1708-1718
[6]   Molecular similarity: a key technique in molecular informatics [J].
Bender, A ;
Glen, RC .
ORGANIC & BIOMOLECULAR CHEMISTRY, 2004, 2 (22) :3204-3218
[7]   Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :170-178
[8]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[9]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9
[10]   Use of structure Activity data to compare structure-based clustering methods and descriptors for use in compound selection [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (03) :572-584