Fuzzy c-means clustering of incomplete data

被引:324
作者
Hathaway, RJ [1 ]
Bezdek, JC
机构
[1] Georgia So Univ, Dept Math & Comp Sci, Statesboro, GA 30460 USA
[2] Univ W Florida, Dept Comp Sci, Pensacola, FL 32504 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2001年 / 31卷 / 05期
关键词
clustering; fuzzy c-means (FCM); incomplete data; missing data;
D O I
10.1109/3477.956035
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of clustering a real s-dimensional data set X = {x(1)..., x(n)} subset of R-s is considered. Usually, each observation (or datum) consists of numerical values for all s features (such as height, length, etc.), but sometimes data sets can contain vectors that are missing one or more of the feature values. For example, a particular datum xk might be incomplete, having the form x(k) = (254.3, ?, 333.2, 47.44, ?)(T), where the second and fifth feature values are missing. The fuzzy e-means (FCM) algorithm is a useful tool for clustering real s-dimensional data, but it is not directly applicable to the case of incomplete data. Four strategies for doing FCM clustering of incomplete data sets are given, three of which involve modified versions of the FCM algorithm. Numerical convergence properties of the new algorithms are discussed, and all approaches are tested using real and artificially generated incomplete data sets.
引用
收藏
页码:735 / 744
页数:10
相关论文
共 25 条
[1]  
[Anonymous], 1997, EM ALGORITHM EXTENSI
[2]  
[Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
[3]  
Bezdek J., 1999, FUZZY MODELS ALGORIT
[4]   CONVERGENCE THEORY FOR FUZZY C-MEANS - COUNTEREXAMPLES AND REPAIRS [J].
BEZDEK, JC ;
HATHAWAY, RJ ;
SABIN, MJ ;
TUCKER, WT .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1987, 17 (05) :873-877
[5]   LOCAL CONVERGENCE ANALYSIS OF A GROUPED VARIABLE VERSION OF COORDINATE DESCENT [J].
BEZDEK, JC ;
HATHAWAY, RJ ;
HOWARD, RE ;
WILSON, CA ;
WINDHAM, MP .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1987, 54 (03) :471-477
[6]   Will the real Iris data please stand up? [J].
Bezdek, JC ;
Keller, JM ;
Krishnapuram, R ;
Kuncheva, LI ;
Pal, NR .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1999, 7 (03) :368-369
[7]  
BEZDEK JC, 1981, P ICASRC, V6, P2773
[8]   CHARACTERIZATION AND DETECTION OF NOISE IN CLUSTERING [J].
DAVE, RN .
PATTERN RECOGNITION LETTERS, 1991, 12 (11) :657-664
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   PATTERN-RECOGNITION WITH PARTLY MISSING DATA [J].
DIXON, JK .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (10) :617-621