Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds

被引:42
作者
Geppert, Hanna [1 ]
Horvath, Tamds [2 ,3 ]
Gaertner, Thomas [2 ]
Wrobel, Stefan [2 ,3 ]
Bajorath, Juergen [1 ]
机构
[1] Rhein Freidrich Wilhelms Univ Bonn, Dept Life Sci Informat, LIMES Program Unit Chem Biol & Med Chem, B IT, D-53113 Bonn, Germany
[2] Fraunhofer IAIS, D-53754 St Augustin, Germany
[3] Univ Bonn, Inst Comp Sci 3, D-53117 Bonn, Germany
关键词
D O I
10.1021/ci700461s
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Similarity searching using molecular fingerprints is computationally efficient and a surprisingly effective virtual screening tool. In this study, we have compared ranking methods for similarity searching using multiple active reference molecules. Different 2D fingerprints were used as search tools and also as descriptors for a support vector machine (SVM) algorithm. In systematic database search calculations, a SVM-based ranking scheme consistently outperformed nearest neighbor and centroid approaches, regardless of the fingerprints that were tested, even if only very small training sets were used for SVM learning. The superiority of SVM-based ranking over conventional fingerprint methods is ascribed to the fact that SVM makes use of information about database molecules, in addition to known active compounds, during the learning phase.
引用
收藏
页码:742 / 746
页数:5
相关论文
共 33 条
[1]   Combinatorial informatics in the post-genomics era [J].
Agrafiotis, DK ;
Lobanov, VS ;
Salemme, FR .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (05) :337-346
[2]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[3]   Integration of virtual and high-throughput screening [J].
Bajorath, F .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (11) :882-894
[4]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[5]   Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier [J].
Bender, A ;
Mussa, HY ;
Glen, RC ;
Reiling, S .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (01) :170-178
[6]   Drug design by machine learning: support vector machines for pharmaceutical data analysis [J].
Burbidge, R ;
Trotter, M ;
Buxton, B ;
Holden, S .
COMPUTERS & CHEMISTRY, 2001, 26 (01) :5-14
[7]   Comparison of support vector machine and artificial neural network systems for drug/nondrug classification [J].
Byvatov, E ;
Fechner, U ;
Sadowski, J ;
Schneider, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06) :1882-1889
[8]  
*CHEM COMP GROUP I, 2007, MOE MOL OP ENV
[9]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction
[10]   Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches [J].
Eckert, Hanna ;
Bojorath, Juergen .
DRUG DISCOVERY TODAY, 2007, 12 (5-6) :225-233