Evaluating boosting algorithms to classify rare classes: Comparison and improvements

被引:132
作者
Joshi, MV [1 ]
Kumar, V [1 ]
Agarwal, RC [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDM.2001.989527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of rare events has many important data mining applications. Boosting is a promising meta-technique that improves the classification performance of any weak classifier So far, no systematic study has been conducted to evaluate how boosting performs for the task of mining rare classes. In this paper we evaluate three existing categories of boosting algorithms from the single viewpoint of how they update the example weights in each iteration, and discuss their possible effect on recall and precision of the rare class. We propose enhanced algorithms in two of the categories, and justify their choice of weight updating parameters theoretically. Using some specially designed synthetic datasets, we compare the capability of all the algorithms from the rare class perspective. The results support our qualitative analysis, and also indicate that our enhancements bring at? extra capability for achieving better balance between recall and precision in mining rare classes.
引用
收藏
页码:257 / 264
页数:8
相关论文
共 10 条
  • [1] Chan P. K., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P164
  • [2] Cohen W. W., 1995, FAST EFFECTIVE RULE
  • [3] Cohen WW, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P335
  • [4] FAN W, 1999, P 6 INT C MACH LEARN
  • [5] JOSHI M, 2001, SIGMOD, P91
  • [6] JOSHI MV, 2001, RC22147 IBM RES DIV
  • [7] Schapire RE, 1999, LECT NOTES ARTIF INT, V1572, P1
  • [8] Improved boosting algorithms using confidence-rated predictions
    Schapire, RE
    Singer, Y
    [J]. MACHINE LEARNING, 1999, 37 (03) : 297 - 336
  • [9] Sebastiani F., 2000, Proceedings of the Ninth International Conference on Information and Knowledge Management. CIKM 2000, P78, DOI 10.1145/354756.354804
  • [10] Ting K. M., 2000, PROC INT C MACHINE L, P983