SVM-based feature selection for characterization of focused compound collections

被引:50
作者
Byvatov, E [1 ]
Schneider, G [1 ]
机构
[1] Goethe Univ Frankfurt, Inst Organ Chem & Chem Biol, D-60439 Frankfurt, Germany
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 03期
关键词
D O I
10.1021/ci0342876
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Artificial neural networks, the support vector machine (SVM), and other machine learning methods for the classification of molecules are often considered as a "black box", since the molecular features that are most relevant for a given classifier are usually not presented in a human-interpretable form. We report on an SVM-based algorithm for the selection of relevant molecular features from a trained classifier that might be important for an understanding of ligand-receptor interactions. The original SVM approach was extended to allow for feature selection. The method was applied to characterize focused libraries of enzyme inhibitors. A comparison with classical Kolmogorov-Smirnov (KS)-based feature selection was performed. In most of the applications the SVM method showed sustained classification accuracy, thereby relying on a smaller number of molecular features than KS-based classifiers. In one case both methods produced comparable results. Limiting the calculation of descriptors to only the most relevant ones for a certain biological activity can also be used to speed up high-throughput virtual screening.
引用
收藏
页码:993 / 999
页数:7
相关论文
共 33 条
[1]  
[Anonymous], 2001, An introduction to genetic algorithms
[2]  
[Anonymous], 2000, Pattern Classification
[3]   CHEMICAL GRAPHS .34. 5 NEW TOPOLOGICAL INDEXES FOR THE BRANCHING OF TREE-LIKE GRAPHS [J].
BALABAN, AT .
THEORETICA CHIMICA ACTA, 1979, 53 (04) :355-375
[4]   HIGHLY DISCRIMINATING DISTANCE-BASED TOPOLOGICAL INDEX [J].
BALABAN, AT .
CHEMICAL PHYSICS LETTERS, 1982, 89 (05) :399-404
[5]  
BANNER D, 2003, PROTEIN LIGAND INTER, P163
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]   Comparison of support vector machine and artificial neural network systems for drug/nondrug classification [J].
Byvatov, E ;
Fechner, U ;
Sadowski, J ;
Schneider, G .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06) :1882-1889
[8]  
Byvatov Evgeny, 2003, Appl Bioinformatics, V2, P67
[9]  
*CHEM COMP GROUP I, 2003, MOE
[10]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411