Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach

被引:94
作者
Chen, Sheng [2 ]
He, Haibo [1 ]
机构
[1] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
[2] Stevens Inst Technol, Dept Elect & Comp Engn, Hoboken, NJ 07030 USA
关键词
Incremental learning; Nonstationary data; Imbalanced learning; Stream data; Ensemble learning; Concept drift;
D O I
10.1007/s12530-010-9021-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Difficulties of learning from nonstationary data stream are generally twofold. First, dynamically structured learning framework is required to catch up with the evolution of unstable class concepts, i.e., concept drifts. Second, imbalanced class distribution over data stream demands a mechanism to intensify the underrepresented class concepts for improved overall performance. To alleviate the challenges brought by these issues, we propose the recursive ensemble approach (REA) in this paper. To battle against the imbalanced learning problem in training data chunk received at any timestamp t,i.e., St; REA adaptively pushes into St part of minority class examples received within [0, t - 1] to balance its skewed class distribution. Hypotheses are then progressively developed over time for all balanced training data chunks and combined together as an ensemble classifier in a dynamically weighted manner, which therefore addresses the concept drifts issue in time. Theoretical analysis proves that REA can provide less erroneous prediction results than a comparative algorithm. Besides that, empirical study on both synthetic benchmarks and real-world data set is also applied to validate effectiveness of REA as compared with other algorithms in terms of evaluation metrics consisting of overall prediction accuracy and ROC curve.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 35 条
[1]  
Aggarwal C, 2007, DATA STREAMS MODELS
[2]  
Aggarwal C. C., 2003, P 2003 ACM SIGMOD IN, P575, DOI DOI 10.1145/872757.872826
[3]   Evolving fuzzy systems from data streams in real-time [J].
Angelov, Plamen ;
Zhou, Xiaowei .
2006 INTERNATIONAL SYMPOSIUM ON EVOLVING FUZZY SYSTEMS, PROCEEDINGS, 2006, :29-+
[4]  
BABCOCK B, 2002, P PODS
[5]  
Breiman L., 1984, CLASSIFICATION REGRE, P368, DOI 10.1201/9781315139470
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[8]  
Chen S, 2010, P WORLD C COMP INT D
[9]  
Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
[10]   Predictive functional control based on an adaptive fuzzy model of a hybrid semi-batch reactor [J].
Dovzan, Dejan ;
Skrjanc, Igor .
CONTROL ENGINEERING PRACTICE, 2010, 18 (08) :979-989