A noise-detection based AdaBoost algorithm for mislabeled data

被引:112
作者
Cao, Jingjing [1 ]
Kwong, Sam [1 ]
Wang, Ran [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
关键词
Pattern recognition; Ensemble learning; AdaBoost; k-NN; EM; CONSTRUCTING ENSEMBLES; BOOSTING ALGORITHMS; CLASSIFIERS;
D O I
10.1016/j.patcog.2012.05.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that: the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4451 / 4465
页数:15
相关论文
共 41 条
[1]   Ensemble tracking [J].
Avidan, Shai .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (02) :261-271
[2]  
Bishop C.M., 2006, Pattern recognition and machine learning, DOI DOI 10.1007/978-0-387-45528-0
[3]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Logistic regression, AdaBoost and Bregman distances [J].
Collins, M ;
Schapire, RE ;
Singer, Y .
MACHINE LEARNING, 2002, 48 (1-3) :253-285
[6]   NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]   An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization [J].
Dietterich, TG .
MACHINE LEARNING, 2000, 40 (02) :139-157
[9]  
Domingo C., 2000, COLT, P180
[10]  
Ferreira A., 2007, ORAL HLTH STATUS ORA