Decision tree evolution using limited number of labeled data items from drifting data streams

被引:14
作者
Fan, W [1 ]
Huang, YA [1 ]
Yu, PS [1 ]
机构
[1] IBM TJ Watson Res, Hawthorne, NY 10532 USA
来源
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDM.2004.10026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are reconstructed only when labelled data become available periodically. This passive stream mining model has several drawbacks. We propose a new concept of demand-driven active data mining. In active mining, the loss of the model is either continuously guessed without using any true class labels or estimated, whenever necessary, from a small number of instances whose actual class labels are verified by paying an affordable cost. When the estimated loss is more than a tolerable threshold, the model evolves by using a small number of instances with verified true class labels. Previous work on active mining concentrates on error guess and estimation. In this paper we discuss several approaches on decision tree evolution.
引用
收藏
页码:379 / 382
页数:4
相关论文
共 8 条
[1]  
[Anonymous], P ACM SIGMOD INT C M
[2]  
BABCOCK B, 2002, ACM S PRINCIPLES DAT
[3]  
CHEN Y, 2002, P LARG DAT VLDB
[4]  
Fan W, 2004, SIAM PROC S, P457
[5]  
GAO L, 2002, INT C MAN DAT SIGMOD
[6]   Clustering data streams [J].
Guha, S ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366
[7]  
HULTEN G, 2001, INT C KNOWL DISC DAT, P97
[8]  
PEI J, 2004, 2004 ACM SIGKDD INT