Noise detection and elimination in data preprocessing: Experiments in medical domains

被引:91
作者
Gamberger, D
Lavrac, N
Dzeroski, S
机构
[1] Rudjer Boskovic Inst, Zagreb 10000, Croatia
[2] Jozef Stefan Inst, Ljubljana, Slovenia
关键词
Data compression - Data privacy - Learning algorithms - Learning systems - Random errors;
D O I
10.1080/088395100117124
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Compression measures used in inductive learners, such as measures based on the minimum description length principle, can be used as a basis for grading candidate hypotheses. Compression-based induction is suited also for handling noisy data. This paper shows that a simple compression measure can be used to detect noisy training examples, where noise is due to random classification errors. A technique is proposed in which noisy examples are detected and eliminated from the training set, and a hypothesis is then built from the set of remaining examples. This noise elimination method was applied to preprocess data for four machine-learning algorithms, and evaluated on selected medical domains.
引用
收藏
页码:205 / 223
页数:19
相关论文
共 28 条
[1]  
Brodley CE, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P799
[2]  
Cestnik B, 1987, Progress in Machine Learning, P31
[3]   PRO-OPIOMELANOCORTIN MESSENGER-RNA SIZE HETEROGENEITY IN ACTH-DEPENDENT CUSHINGS-SYNDROME [J].
CLARK, AJL ;
LAVENDER, PM ;
BESSER, GM ;
REES, LH .
JOURNAL OF MOLECULAR ENDOCRINOLOGY, 1989, 2 (01) :3-9
[4]  
Clark P., 1991, Machine Learning - EWSL-91. European Working Session on Learning Proceedings, P151, DOI 10.1007/BFb0017011
[5]   A BERNSTEIN RESULT FOR ENERGY MINIMIZING HYPERSURFACES [J].
DIERKES, U .
CALCULUS OF VARIATIONS AND PARTIAL DIFFERENTIAL EQUATIONS, 1993, 1 (01) :37-54
[6]  
Dzeroski S, 1996, Technol Health Care, V4, P203
[7]  
FAYYAD UM, 1992, MACH LEARN, V8, P87, DOI 10.1023/A:1022638503176
[8]  
GAMBERGER D, 1995, P 8 EUR C MACH LEARN, P151
[9]  
GAMBERGER D, 1996, P 7 INT WORKSH ALG L, P199
[10]  
GAMBERGER D, 1997, P 9 EUR C MACH LEARN, P108