On the use of neural network ensembles in QSAR and QSPR

被引:137
作者
Agrafiotis, DK [1 ]
Cedeño, W [1 ]
Lobanov, VS [1 ]
机构
[1] 3 Dimens Pharmaceut Inc, Exton, PA 19341 USA
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 2002年 / 42卷 / 04期
关键词
D O I
10.1021/ci0203702
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Despite their growing popularity among neural network practitioners, ensemble methods have not been widely adopted in structure-activity and structure-property correlation. Neural networks are inherently unstable, in that small changes in the training set and/or training parameters can lead to large changes in their generalization performance. Recent research has shown that by capitalizing on the diversity of the individual models, ensemble techniques can minimize uncertainty and produce more stable and accurate predictors. In this work, we present a critical assessment of the most common ensemble technique known as bootstrap aggregation, or bagging, as applied to QSAR and QSPR. Although aggregation does offer definitive advantages, we demonstrate that bagging may not be the best possible choice and that simpler techniques such as retraining with the full sample can often produce superior results. These findings are rationalized using Krogh and Vedelsby's decomposition of the generalization error into a term that measures the average generalization performance of the individual networks and a term that measures the diversity among them. For networks that are designed to resist over-fitting. the benefits of aggregation are clear but not overwhelming.
引用
收藏
页码:903 / 911
页数:9
相关论文
共 51 条