SMOTEBoost: Improving prediction of the minority class in boosting

被引:1147
作者
Chawla, NV
Lazarevic, A
Hall, LO
Bowyer, KW
机构
[1] Canadian Imperial Bank Commerce, Business Analyt Solut, Toronto, ON M5J 2S8, Canada
[2] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
[3] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[4] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
来源
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS | 2003年 / 2838卷
关键词
D O I
10.1007/978-3-540-39804-2_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class. SMOTE (Synthetic Minority Over-sampling TEchnique) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning from imbalanced data sets, based on a combination of the SMOTE algorithm and the boosting procedure. Unlike standard boosting where all misclassified examples are given equal weights, SMOTEBoost creates synthetic examples from the rare or minority class, thus indirectly changing the updating weights and compensating for skewed distributions. SMOTEBoost applied to several highly and moderately imbalanced data sets shows improvement in prediction performance on the minority class and overall improved F-values.
引用
收藏
页码:107 / 119
页数:13
相关论文
共 23 条
[1]  
[Anonymous], 2000, P 2000 INT C ART INT
[2]  
[Anonymous], 1 IEEE INT C DAT MIN
[3]  
Blake C.L., 1998, UCI repository of machine learning databases
[4]  
BUCKLAND M, 1994, J AM SOC INFORM SCI, V45, P12, DOI 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO
[5]  
2-L
[6]  
Chan P. K., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P164
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[9]   A WEIGHTED NEAREST NEIGHBOR ALGORITHM FOR LEARNING WITH SYMBOLIC FEATURES [J].
COST, S ;
SALZBERG, S .
MACHINE LEARNING, 1993, 10 (01) :57-78
[10]  
Fan W., 1999, P 16 INT C MACH LEAR