How large should ensembles of classifiers be?

被引:44
作者
Hernandez-Lobato, Daniel [1 ]
Martinez-Munoz, Gonzalo [1 ]
Suarez, Alberto [1 ]
机构
[1] Univ Autonoma Madrid, Escuela Politecn Super, Dept Comp Sci, E-28049 Madrid, Spain
关键词
Ensemble learning; Bagging; Random forest; Asymptotic ensemble prediction; Ensemble size; THEORETICAL-ANALYSIS; ACCURACY; LIMITS;
D O I
10.1016/j.patcog.2012.10.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to determine the size of a parallel ensemble by estimating the minimum number of classifiers that are required to obtain stable aggregate predictions. Assuming that majority voting is used, a statistical description of the convergence of the ensemble prediction to its asymptotic (infinite size) limit is given. The analysis of the voting process shows that for most test instances the ensemble prediction stabilizes after only a few classifiers are polled. By contrast, a small but non-negligible fraction of these instances require large numbers of classifier queries to reach stable predictions. Specifically, the fraction of instances whose stable predictions require more than T classifiers for T >> 1 has a universal form and is proportional to T-1/2. The ensemble size is determined as the minimum number of classifiers that are needed to estimate the infinite ensemble prediction at an average confidence level alpha, close to one. This approach differs from previous proposals, which are based on determining the size for which the prediction error (not the predictions themselves) stabilizes. In particular, it does not require estimates of the generalization performance of the ensemble, which can be unreliable. It has general validity because it is based solely on the statistical description of the convergence of majority voting to its asymptotic limit Extensive experiments using representative parallel ensembles (bagging and random forest) illustrate the application of the proposed framework in a wide range of classification problems. These experiments show that the optimal ensemble size is very sensitive to the particular classification problem considered. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1323 / 1336
页数:14
相关论文
共 37 条
[1]  
Abramowitz M., 1964, Handbook of mathematical functions with formulas, graphs, and mathematical tables, DOI DOI 10.1119/1.15378
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   A comparison of decision tree ensemble creation techniques [J].
Banfield, Robert E. ;
Hall, Lawrence O. ;
Bowyer, Kevin W. ;
Kegelmeyer, W. P. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (01) :173-180
[4]  
Bernard S, 2009, LECT NOTES COMPUT SC, V5519, P171, DOI 10.1007/978-3-642-02326-2_18
[5]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Breiman L, 1998, ANN STAT, V26, P801
[9]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[10]   Randomizing outputs to increase prediction accuracy [J].
Breiman, L .
MACHINE LEARNING, 2000, 40 (03) :229-242