Comparisons of classification methods for screening potential compounds

被引:5
作者
An, AJ [1 ]
Wang, YY [1 ]
机构
[1] York Univ, Dept Comp Sci, Toronto, ON M3J 1P3, Canada
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDM.2001.989495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We compare a number of data mining and statistical methods on the drug design problem of modeling molecular structure-activity relationships. The relationships can be used to identify active compounds based on their chemical structures front a large inventory of chemical compounds. The data set of this application has a highly skewed class distribution, in which only 2% of the compounds are considered active. We apply a number of classification methods to this extremely imbalanced data set and propose to use different performance measures to evaluate these methods. We report our findings on the characteristics of the performance measures, the effect of using pruning techniques in this application and a comparison of local learning methods with global techniques. We also investigate whether reducing the imbalance in the training data by up-sampling or down-sampling would improve the predictive performance.
引用
收藏
页码:11 / 18
页数:8
相关论文
共 17 条
[1]  
An A., 1998, P 12 CAN C ART INT V
[2]  
AN A, 2000, P 12 INT S METH INT, P119
[3]   APPLICATIONS OF NEURAL NETWORKS IN QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS OF DIHYDROFOLATE-REDUCTASE INHIBITORS [J].
ANDREA, TA ;
KALAYEH, H .
JOURNAL OF MEDICINAL CHEMISTRY, 1991, 34 (09) :2824-2836
[4]  
[Anonymous], AAAI 2000 WORKSH IMB
[5]   MOLECULAR-IDENTIFICATION NUMBER FOR SUBSTRUCTURE SEARCHES [J].
BURDEN, FR .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1989, 29 (03) :225-227
[6]  
Clark L, 1991, STAT MODELS S
[7]  
DeRouin E., 1991, Intelligent Engineering Systems Through Artificial Neural Networks, V1, P135
[8]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[9]   Use of recursive partitioning in the sequential screening of G-protein-coupled receptors [J].
Jones-Hertzog, DK ;
Mukhopadhyay, P ;
Keefer, CE ;
Young, SS .
JOURNAL OF PHARMACOLOGICAL AND TOXICOLOGICAL METHODS, 1999, 42 (04) :207-215
[10]  
KING R, 1992, P NATL ACAD SCI, V89