Cost-sensitive learning by cost-proportionate example weighting

被引:363
作者
Zadrozny, B [1 ]
Langford, J [1 ]
Abe, N [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Dept Math Sci, Yorktown Hts, NY 10598 USA
来源
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2003年
关键词
D O I
10.1109/icdm.2003.1250950
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose and evaluate a family of methods,for converting classifier learning algorithms and classification theory into cost-sensitive algorithms and theory. The proposed conversion is based on cost-proportionate weighting of the training examples, which can be realized either by feeding the weights to the classification algorithm (as often done in boosting), or by careful subsampling. We give some theoretical performance guarantees on the proposed methods, as well as empirical evidence that they, are practical alternatives to existing approaches. In particular, we propose costing, a method based on cost-proportionate rejection sampling and ensemble aggregation, which achieves excellent predictive performance on two publicly available datasets, while drastically reducing the computation required by, other methods.
引用
收藏
页码:435 / 442
页数:8
相关论文
共 19 条
[1]  
ANIFANTIS S, 2002, DMEF DATA SET LIB
[2]  
[Anonymous], 1951, Appl. Math Ser, DOI DOI 10.1080/01621459.1949.10483310
[3]  
[Anonymous], 2000, P INT C MACHINE LEAR
[4]  
Domingos P., 1999, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P155
[5]   A GENERAL LOWER BOUND ON THE NUMBER OF EXAMPLES NEEDED FOR LEARNING [J].
EHRENFEUCHT, A ;
HAUSSLER, D ;
KEARNS, M ;
VALIANT, L .
INFORMATION AND COMPUTATION, 1989, 82 (03) :247-261
[6]  
Elkan C., 2001, INT JOINT C ARTIFICI, P973, DOI DOI 10.5555/1642194.1642224
[7]  
Elkan C., 1997, Boosting and Naive Bayesian learning
[8]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[9]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[10]  
Hettich S., The UCI KDD Archive