Induction of decision trees via evolutionary programming

被引:41
作者
DeLisle, RK
Dixon, SL
机构
[1] Pharmacopeia, Dept Mol Modeling, Princeton, NJ 08543 USA
[2] Schrodinger, New York, NY 10036 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2004年 / 44卷 / 03期
关键词
D O I
10.1021/ci034188s
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Decision trees have been used extensively in cheminformatics for modeling various biochemical endpoints including receptor-ligand binding, ADME properties, environmental impact, and toxicity. The traditional approach to inducing decision trees based upon a given training set of data involves recursive partitioning which selects partitioning variables and their values in a greedy manner to optimize a given measure of purity. This methodology has numerous benefits including classifier interpretability and the capability of modeling nonlinear relationships. The greedy nature of induction, however, may fail to elucidate underlying relationships between the data and endpoints. Using evolutionary programming, decision trees are induced which are significantly more accurate than trees induced by recursive partitioning. Furthermore, when assessed on previously unseen data in a 10-fold cross-validated manner, evolutionary programming induced trees exhibit a significantly higher accuracy on previously unseen data. This methodology is compared to single-tree and multiple-tree recursive partitioning in two domains (aerobic biodegradability and hepatotoxicity) and shown to produce less complex classifiers with average increases in predictive accuracy of 5-10% over the traditional method.
引用
收藏
页码:862 / 870
页数:9
相关论文
共 43 条
[1]  
[Anonymous], 2000, EVOLUTIONARY COMPUTA
[2]  
Bains W, 2002, CURR OPIN DRUG DISC, V5, P44
[3]   On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds [J].
Blower, P ;
Fligner, M ;
Verducci, J ;
Bjoraker, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (02) :393-404
[4]  
Breiman L., 1998, CLASSIFICATION REGRE
[5]  
CHENG A, IN PRESS J COMPUT AI
[6]  
Crawley M.J., 2002, STAT COMPUTING INTRO
[7]   Investigation of classification methods for the prediction of activity in diverse chemical libraries [J].
Dixon, SL ;
Villar, HO .
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1999, 13 (05) :533-545
[8]   One-dimensional molecular representations and similarity calculations: Methodology and validation [J].
Dixon, SL ;
Merz, KM .
JOURNAL OF MEDICINAL CHEMISTRY, 2001, 44 (23) :3795-3809
[9]  
Duda R. O., 2000, PATTERN CLASSIFICATI
[10]  
Farrell GC., 1994, DRUG INDUCED LIVER D