CLASSIFICATION OF IMBALANCED DATA: A REVIEW

被引:1113
作者
Sun, Yanmin [1 ]
Wong, Andrew K. C. [2 ]
Kamel, Mohamed S. [3 ]
机构
[1] Pattern Discovery Technol Inc, Waterloo, ON N2L 5Z4, Canada
[2] Univ Waterloo, Syst Design Dept, Waterloo, ON N2L 3G1, Canada
[3] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
Classification; class imbalance problem; DISCOVERY;
D O I
10.1142/S0218001409007326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of data with imbalanced class distribution has encountered a significant drawback of the performance attainable by most standard classifier learning algorithms which assume a relatively balanced class distribution and equal misclassification costs. This paper provides a review of the classification of imbalanced data regarding: the application domains; the nature of the problem; the learning difficulties with standard classifier learning algorithms; the learning objectives and evaluation measures; the reported research solutions; and the class imbalance problem in the presence of multiple classes.
引用
收藏
页码:687 / 719
页数:33
相关论文
共 92 条
[1]  
Abe N., 2004, P 10 ACM SIGKDD INT, P3
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]   AN IMPROVED ALGORITHM FOR NEURAL-NETWORK CLASSIFICATION OF IMBALANCED TRAINING SETS [J].
ANAND, R ;
MEHROTRA, KG ;
MOHAN, CK ;
RANKA, S .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (06) :962-969
[4]  
[Anonymous], P ACM SIGKDD INT C K
[5]  
[Anonymous], P 10 EUR C MACH LEAR
[6]  
[Anonymous], P 6 INT C KNOWL DISC
[7]  
[Anonymous], 2003, P ICML 2003 WORKSH L
[8]  
[Anonymous], 2006, Introduction to Data Mining
[9]  
[Anonymous], 1998, PROC 17 ANN INT ACM
[10]  
Batista G. E. A. P. A., 2004, ACM SIGKDD Explorations Newsletter, V6, P20, DOI DOI 10.1145/1007730.1007735