Inverse random under sampling for class imbalance problem and its application to multi-label classification

被引:366
作者
Tahir, Muhammad Atif [1 ,2 ]
Kittler, Josef [1 ]
Yan, Fei [1 ]
机构
[1] Northumbria Univ, Sch Comp Engn & Informat Sci, Newcastle Upon Tyne NE1 8ST, Tyne & Wear, England
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
关键词
Class imbalance problem; Multi-label classification; Inverse random under sampling;
D O I
10.1016/j.patcog.2012.03.014
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
In this paper, a novel inverse random under sampling (IRUS) method is proposed for the class imbalance problem. The main idea is to severely under sample the majority class thus creating a large number of distinct training sets. For each training set we then find a decision boundary which separates the minority class from the majority class. By combining the multiple designs through fusion, we construct a composite boundary between the majority class and the minority class. The proposed methodology is applied on 22 UCI data sets and experimental results indicate a significant increase in performance when compared with many existing class-imbalance learning methods. We also present promising results for multi-label classification, a challenging research problem in many modern applications such as music, text and image categorization. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3738 / 3750
页数:13
相关论文
共 48 条
[1]
[Anonymous], 2009, INT WORKSH MULT CLAS
[2]
[Anonymous], 2004, Int. J. Comput. Intell, DOI DOI 10.1103/PHYSREVD.77.085025
[3]
[Anonymous], 2008, ISMIR
[4]
[Anonymous], 1997, ICML
[5]
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[6]
Blake C.C., 2010, UCI REPOSITORY MACHI
[7]
Learning multi-label scene classification [J].
Boutell, MR ;
Luo, JB ;
Shen, XP ;
Brown, CM .
PATTERN RECOGNITION, 2004, 37 (09) :1757-1771
[8]
Breiman L, 1996, ANN STAT, V24, P2350
[9]
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[10]
Chan P. K., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P164