Robust maximum entropy clustering algorithm with its labeling for outliers

被引:35
作者
Wang, ST [1 ]
Chung, KFL
Deng, ZH
Hu, DW
Wu, XS
机构
[1] So Yangtze Univ, Sch Informat Engn, Wuxi, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China
[3] Natl Def Univ Sci & Technol, Sch Automat, Changsha, Peoples R China
关键词
entropy; clustering; robustness; outliers; epsilon-insensitive loss function; weight factors;
D O I
10.1007/s00500-005-0517-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel robust maximum entropy clustering algorithm RMEC, as the improved version of the maximum entropy algorithm MEC [2-4], is presented to overcome MEC's drawbacks: very sensitive to outliers and uneasy to label them. Algorithm RMEC incorporates Vapnik's epsilon-insensitive loss function and the new concept of weight factors into its objective function and consequently, its new update rules are derived according to the Lagrangian optimization theory. Compared with algorithm MEC, the main contributions of algorithm RMEC exit in its much better robustness to outliers and the fact that it can effectively label outliers in the dataset using the obtained weight factors. Our experimental results demonstrate its superior performance in enhancing the robustness and labeling outliers in the dataset.
引用
收藏
页码:555 / 563
页数:9
相关论文
共 23 条
[1]  
Barnett V., 1984, Outliers in Statistical Data, V2nd
[2]  
Bezdek J., 1982, PATTERN RECOGNITION
[3]  
DENG ZH, J SO YANGTZE U
[4]  
DENG ZH, 2003, J SO YANGTZE U, P75
[5]  
Gill P. E., 1981, PRACTICAL OPTIMIZATI
[6]  
Huber P. J., 1981, ROBUST STAT
[7]   Two-phase clustering process for outliers detection [J].
Jiang, MF ;
Tseng, SS ;
Su, CM .
PATTERN RECOGNITION LETTERS, 2001, 22 (6-7) :691-700
[8]  
Kailing K, 2004, LECT NOTES ARTIF INT, V3056, P394
[9]  
KARAYIANNIS NB, 1994, PROCEEDINGS OF THE THIRD IEEE CONFERENCE ON FUZZY SYSTEMS - IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, VOLS I-III, P630, DOI 10.1109/FUZZY.1994.343658
[10]   Efficient biased sampling for approximate clustering and outlier detection in large data sets [J].
Kollios, G ;
Gunopulos, D ;
Koudas, N ;
Berchtold, S .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (05) :1170-1187