A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

被引:212
作者
Aydilek, Ibrahim Berkan [1 ]
Arslan, Ahmet [1 ]
机构
[1] Selcuk Univ, Dept Comp Engn, Konya, Turkey
关键词
Missing data; Missing values; Imputation; Support vector regression; Fuzzy c-means; NEURAL-NETWORKS; CLASSIFICATION; APPROXIMATION;
D O I
10.1016/j.ins.2013.01.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values in datasets should be extracted from the datasets or should be estimated before they are used for classification, association rules or clustering in the preprocessing stage of data mining. In this study, we utilize a fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm. In this method, the fuzzy clustering parameters, cluster size and weighting factor are optimized and missing values are estimated. The proposed novel hybrid method yields sufficient and sensible imputation performance results. The results are compared with those of fuzzy c-means genetic algorithm imputation, support vector regression genetic algorithm imputation and zero imputation. (c) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 49 条
[1]  
Abdella M, 2005, COMPUT INFORM, V24, P577
[2]   A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data [J].
Ahmed, MN ;
Yamany, SM ;
Mohamed, N ;
Farag, AA ;
Moriarty, T .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2002, 21 (03) :193-199
[3]  
[Anonymous], 2005, 16 ANN S PATTERNRECO
[4]  
[Anonymous], 2009, EUR J SCI RES
[5]  
[Anonymous], 2002, NEOPLASIA
[6]  
[Anonymous], 1967, Theory or Rank Tests
[7]  
Aydilek IB, 2012, INT J INNOV COMPUT I, V8, P4705
[8]  
Basak S., 2007, Support Vector Regression, V11, P203
[9]   On the use of cross-validation for time series predictor evaluation [J].
Bergmeir, Christoph ;
Benitez, Jose M. .
INFORMATION SCIENCES, 2012, 191 :192-213
[10]  
Blake C. L., 1998, Uci repository of machine learning databases