Active learning with support vector machines in the drug discovery process

被引:263
作者
Warmuth, MK [1 ]
Liao, J
Rätsch, G
Mathieson, M
Putta, S
Lemmen, C
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Australian Natl Univ, RSISE, Canberra, ACT 0200, Australia
[3] Rational Discovery LLC, Palo Alto, CA 94301 USA
[4] BioSolveIT GMBH, D-53757 St Augustin, Germany
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2003年 / 43卷 / 02期
关键词
D O I
10.1021/ci025620t
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from Machine Learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector Machines". This hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.
引用
收藏
页码:667 / 673
页数:7
相关论文
共 22 条
[1]  
Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1023/A:1022821128753
[2]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[3]  
Atlas Les E., 1990, Advances in Neural Information Processing Systems, P566
[4]  
Bachrach R, 1999, LECT NOTES ARTIF INT, V1572, P34
[5]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[6]   Drug design by machine learning: support vector machines for pharmaceutical data analysis [J].
Burbidge, R ;
Trotter, M ;
Buxton, B ;
Holden, S .
COMPUTERS & CHEMISTRY, 2001, 26 (01) :5-14
[7]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[8]  
CAMPBELL C, 2000, P ICML2000 STANF CA, P8
[9]  
Cohn D. A., 1995, Advances in Neural Information Processing Systems 7, P705
[10]   Coupling structure-based design with combinatorial chemistry: application of active site derived pharmacophores with informative library design [J].
Eksterowicz, JE ;
Evensen, E ;
Lemmen, C ;
Brady, GP ;
Lanctot, JK ;
Bradley, EK ;
Saiah, E ;
Robinson, LA ;
Grootenhuis, PDJ ;
Blaney, JM .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2002, 20 (06) :469-477