SVM based adaptive learning method for text classification from positive and unlabeled documents

被引:57
作者
Peng, Tao [1 ]
Zuo, Wanli [1 ,2 ]
He, Fengling [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Jilin 130012, Peoples R China
[2] Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
关键词
text classification; machine learning; improved 1-DNF algorithm; SVM; PSO; focused web crawling;
D O I
10.1007/s10115-007-0107-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F-1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.
引用
收藏
页码:281 / 301
页数:21
相关论文
共 29 条
  • [1] [Anonymous], P PART SWARM OPT WOR
  • [2] [Anonymous], 1995, ICML
  • [3] Building text classifiers using positive and unlabeled examples
    Bing, L
    Yang, D
    Li, XL
    Lee, WS
    Yu, PS
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 179 - 186
  • [4] BING L, 2002, 19 INT C MACH LEARN, P384
  • [5] Craven M, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P509
  • [6] De Comité F, 1999, LECT NOTES ARTIF INT, V1720, P219
  • [7] An evolutionary approach for automatically extracting intelligible classification rules
    De Falco, I
    Della Cioppa, A
    Iazzetta, A
    Tarantino, E
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 7 (02) : 179 - 201
  • [8] Denis F, 1998, LECT NOTES ARTIF INT, V1501, P112
  • [9] Denis F., 2002, P 9 INT C INF PROC M, P1927
  • [10] Eberhart RC, 2000, IEEE C EVOL COMPUTAT, P84, DOI 10.1109/CEC.2000.870279