CLASSIFICATION OF IMBALANCED DATA: A REVIEW

被引:1113
作者
Sun, Yanmin [1 ]
Wong, Andrew K. C. [2 ]
Kamel, Mohamed S. [3 ]
机构
[1] Pattern Discovery Technol Inc, Waterloo, ON N2L 5Z4, Canada
[2] Univ Waterloo, Syst Design Dept, Waterloo, ON N2L 3G1, Canada
[3] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
关键词
Classification; class imbalance problem; DISCOVERY;
D O I
10.1142/S0218001409007326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of data with imbalanced class distribution has encountered a significant drawback of the performance attainable by most standard classifier learning algorithms which assume a relatively balanced class distribution and equal misclassification costs. This paper provides a review of the classification of imbalanced data regarding: the application domains; the nature of the problem; the learning difficulties with standard classifier learning algorithms; the learning objectives and evaluation measures; the reported research solutions; and the class imbalance problem in the presence of multiple classes.
引用
收藏
页码:687 / 719
页数:33
相关论文
共 92 条
[31]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[32]   Adaptive fraud detection [J].
Fawcett, T ;
Provost, F .
DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1 (03) :291-316
[33]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[34]   Additive logistic regression: A statistical view of boosting - Rejoinder [J].
Friedman, J ;
Hastie, T ;
Tibshirani, R .
ANNALS OF STATISTICS, 2000, 28 (02) :400-407
[35]   Bayesian network classifiers [J].
Friedman, N ;
Geiger, D ;
Goldszmidt, M .
MACHINE LEARNING, 1997, 29 (2-3) :131-163
[36]  
Guo H., 2004, SIGKDD Explor Newsl, V6, P30, DOI DOI 10.1145/1007730.1007736
[37]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[38]  
HECHERMAN D, 1996, ADV KNOWLEDGE DISCOV, P273
[39]  
Hertz J., 1991, Introduction to the Theory of Neural Computation
[40]  
Holte R.C., 1989, Proceedings of 11th international joint conference on artificial intelligence, P813