Inductive learning models with missing values

被引:17
作者
Fortes, I.
Mora-Lopez, L.
Morales, R.
Triguero, F.
机构
[1] Univ Malaga, Dept Matemat Aplicada, ETS Ingn Informat, E-29071 Malaga, Spain
[2] Univ Malaga, Dept Leng & C Computac, ETS Ingn Informat, E-29071 Malaga, Spain
关键词
missing values; decision tree; decision theory; data mining; machine learning;
D O I
10.1016/j.mcm.2006.02.013
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, a new approach to working with missing attribute values in inductive learning algorithms is introduced. Three fundamental issues are studied: the splitting criterion, the allocation of values to missing attribute values, and the prediction of new observations. The formal definition for the splitting criterion is given. This definition takes into account the missing attribute values and generalizes the classical definition. In relation to the second objective, multiple values are assigned to missing attribute values using a decision theory approach. Each of these multiple values will have an associated confidence and error parameter. The error parameter measures how near or how far the value is from the original value of the attribute. After applying a splitting criterion, a decision tree is obtained (from training sets with or without missing attribute values). This decision tree can be used to predict the class of an observation (with or without missing attribute values). Hence, there are four perspectives. The three perspectives with missing attribute values are studied and experimental results are presented. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:790 / 806
页数:17
相关论文
共 40 条
[1]  
[Anonymous], 2004, Proceedings of the IPMU
[2]  
[Anonymous], 1979, EXPERT SYST MICROELE
[3]  
[Anonymous], 1983, MACHINE LEARNING ART
[4]  
Berry D., 1988, STAT DECISION THEORY
[5]  
Blake C.L., 1998, UCI repository of machine learning databases
[6]  
Breiman L., 1998, CLASSIFICATION REGRE
[7]  
Cestnik B, 1987, P 2 EUR C EUR WORK S, DOI 10.5555/3108739.3108742
[8]   CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules [J].
Cios, KJ ;
Kurgan, LA .
INFORMATION SCIENCES, 2004, 163 (1-3) :37-83
[9]  
CIOS KJ, 2001, NEW LEARNING PARADIG, P276
[10]  
DUDA RO, 1977, PATTERN CLASSIFICATI