Building text classifiers using positive and unlabeled examples

被引:414
作者
Bing, L [1 ]
Yang, D [1 ]
Li, XL [1 ]
Lee, WS [1 ]
Yu, PS [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60612 USA
来源
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2003年
关键词
D O I
10.1109/icdm.2003.1250918
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the problem of building text classifiers using positive and unlabeled examples. The key feature of this problem is that there is no negative example for learning. Recently, a few techniques for solving this problem were proposed in the literature. These techniques are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each step. In this paper, we first introduce some new methods for the two steps, and perform a comprehensive evaluation of all possible combinations of methods of the two steps. We then propose a more principled approach to solving the problem based on a biased formulation of SVM, and show experimentally that it is more accurate than the existing techniques.
引用
收藏
页码:179 / 186
页数:8
相关论文
共 35 条
[1]  
AGRAWAL R, 2000, EDBT 00
[2]  
[Anonymous], 1997, 1602 AI MIT
[3]  
[Anonymous], 1994, SIGIR 94
[4]  
BASU S, 2002, ICML 02
[5]  
BENNETT K, 1998, ADV NEURAL INFORMATI, P11
[6]  
Blum A., 1998, COLT 98
[7]  
BOCKHORST J, 2002, ICML 02
[8]  
BUCKLEY C, 1994, SIGIR 94
[9]  
DEMPSTER A., 1997, J ROYAL STAT SOC B, V39, P1
[10]  
Denis F., 2002, IPMU