Contemporary QSAR classifiers compared

被引:119
作者
Bruce, Craig L.
Melville, James L.
Pickett, Stephen D.
Hirst, Jonathan D.
机构
[1] Univ Nottingham, Sch Chem, Nottingham NG7 2RD, England
[2] GlaxoSmithKline Inc, Stevenage SG1 2NY, Herts, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1021/ci600332j
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
引用
收藏
页码:219 / 227
页数:9
相关论文
共 72 条
[1]  
*ACC INC, CERIUS 2
[2]   On the use of neural network ensembles in QSAR and QSPR [J].
Agrafiotis, DK ;
Cedeño, W ;
Lobanov, VS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (04) :903-911
[3]   CNS permeability of drugs predicted by a decision tree [J].
Andres, C ;
Hutter, MC .
QSAR & COMBINATORIAL SCIENCE, 2006, 25 (04) :305-309
[4]  
[Anonymous], 1993, C4 5 PROGRAMS MACHIN
[5]  
[Anonymous], 1999, ADV NEURAL INFORM PR
[6]   Ensemble of linear models for predicting drug properties [J].
Arodz, T ;
Yuen, DA ;
Dudek, AZ .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (01) :416-423
[7]   HIGHLY DISCRIMINATING DISTANCE-BASED TOPOLOGICAL INDEX [J].
BALABAN, AT .
CHEMICAL PHYSICS LETTERS, 1982, 89 (05) :399-404
[8]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[9]  
Bellman R. E., 1961, ADAPTIVE CONTROL PRO, DOI DOI 10.1515/9781400874668
[10]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669