Dynamic classifier ensemble for positive unlabeled text stream classification

被引:15
作者
Pan, Shirui [1 ]
Zhang, Yang [1 ]
Li, Xue [2 ]
机构
[1] NW A&F Univ, Coll Informat Engn, Yangling, Peoples R China
[2] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
中国国家自然科学基金;
关键词
Positive unlabeled learning; Text streams; Classifier ensemble; Concept drift; CONCEPT DRIFT; EXAMPLES;
D O I
10.1007/s10115-011-0469-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56-68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).
引用
收藏
页码:267 / 287
页数:21
相关论文
共 43 条
  • [1] [Anonymous], 2000, ICML, DOI DOI 10.1007/978-3-540-44871-6_130
  • [2] [Anonymous], 2009, SDM
  • [3] [Anonymous], PATTERN RECOGNIT LET
  • [4] [Anonymous], KNOWL INF SYST
  • [5] [Anonymous], P 1 INT WORKSH MULT
  • [6] Building text classifiers using positive and unlabeled examples
    Bing, L
    Yang, D
    Li, XL
    Lee, WS
    Yu, PS
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 179 - 186
  • [7] Learning from positive and unlabeled examples
    Denis, F
    Gilleron, R
    Letouzey, F
    [J]. THEORETICAL COMPUTER SCIENCE, 2005, 348 (01) : 70 - 83
  • [8] A study on the performances of dynamic classifier selection based on local accuracy estimation
    Didaci, L
    Giacinto, G
    Roli, F
    Marcialis, GL
    [J]. PATTERN RECOGNITION, 2005, 38 (11) : 2188 - 2191
  • [9] Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
  • [10] Decision tree evolution using limited number of labeled data items from drifting data streams
    Fan, W
    Huang, YA
    Yu, PS
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 379 - 382