Non-stationary data sequence classification using online class priors estimation

被引:17
作者
Yang, Chunyu [1 ]
Zhou, Jie [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Automat, Beijing 100084, Peoples R China
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
concept drift; online classification; EM;
D O I
10.1016/j.patcog.2008.01.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online classification is important for real time data sequence classification. Its most challenging problem is that the class priors may vary for non-stationary data sequences. Most of the current online-data-sequence-classification algorithms assume that the class labels of some new-arrived data samples are known and retrain the classifier accordingly. Unfortunately, such assumption is often violated in real applications. But if we were able to estimate the class priors on the test data sequence accurately, we could adjust the classifier without retraining it while preserving a reasonable accuracy. There has been some work on the class priors estimation to classify static data sets using the offline iterative EM algorithm, which has been proved to be quite effective to adjust the classifier. Inspired by the offline iterative EM algorithm for static data sets, in this paper, we propose an online incremental EM algorithm to estimate the class priors along the data sequence. The classifier is adjusted accordingly to keep pace with the varying distribution. The proposed online algorithm is more computationally efficient because it scans the sequence only once. Experimental results show that the proposed algorithm indeed performs better than the conventional offline iterative EM algorithm when the class priors are non-stationary. (c) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2656 / 2664
页数:9
相关论文
共 26 条
  • [1] [Anonymous], 1999, ICML
  • [2] Online adaptive decision trees: Pattern classification and function approximation
    Basak, Jayanta
    [J]. NEURAL COMPUTATION, 2006, 18 (09) : 2062 - 2101
  • [3] CARUANA R, 2006, ICML 06, P161
  • [4] ASYMPTOTICALLY EFFICIENT ESTIMATION OF PRIOR PROBABILITIES IN MULTICLASS FINITE MIXTURES
    DATTATREYA, GR
    KANAL, LN
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1991, 37 (03) : 482 - 489
  • [5] On-line pattern analysis by evolving self-organizing maps
    Deng, D
    Kasabov, N
    [J]. NEUROCOMPUTING, 2003, 51 : 87 - 103
  • [6] DOMINGOS P, 2000, KNOWLEDGE DISCOVERY, P71, DOI DOI 10.1145/347090.347107
  • [7] Duda R. O., 1973, Pattern Classification
  • [8] ELYANIV R, 2006, COLT, P35
  • [9] Applying lazy learning algorithms to tackle concept drift in spam filtering
    Fdez-Riverola, F.
    Iglesias, E. L.
    Diaz, F.
    Mendez, J. R.
    Corchado, J. M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 36 - 48
  • [10] FORMAN G, 2006, SIGIR, P252