An association-based dissimilarity measure for categorical data

被引:67
作者
Le, SQ [1 ]
Ho, TB [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Sch Knowledge Sci, Tatsunokuchi, Ishikawa 9231292, Japan
关键词
dissimilarity measures; categorical data; conditional probability distribution; hypothesis testing; nearest neighbor;
D O I
10.1016/j.patrec.2005.06.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In this paper, we propose a novel method to measure the dissimilarity of categorical data. The key idea is to consider the dissimilarity between two categorical values of an attribute as a combination of dissimilarities between the conditional probability distributions of other attributes given these two values. Experiments with real data show that our dissimilarity estimation method improves the accuracy of the popular nearest neighbor classifier. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:2549 / 2557
页数:9
相关论文
共 28 条
[1]
ALBERT ML, 1983, QUANTITATIVE APPL SO, V32
[2]
[Anonymous], 1983, MEASURES ASS
[3]
[Anonymous], 1950, STAT METHODS RES WOR
[4]
[Anonymous], 1963, PRINCIPLES NUMERICAL
[5]
COMPARING RESEMBLANCE MEASURES [J].
BATAGELJ, V ;
BREN, M .
JOURNAL OF CLASSIFICATION, 1995, 12 (01) :73-90
[6]
A CLASSIFICATION OF PRESENCE ABSENCE BASED DISSIMILARITY COEFFICIENTS [J].
BAULIEU, FB .
JOURNAL OF CLASSIFICATION, 1989, 6 (02) :233-246
[7]
Blake C.L., 1998, UCI repository of machine learning databases
[8]
NEAREST NEIGHBOR PATTERN CLASSIFICATION [J].
COVER, TM ;
HART, PE .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) :21-+
[9]
De Carvalho F. A. T., 1998, DATA SCI CLASSIFICAT, P370, DOI DOI 10.1007/978-4-431-65950-1_41
[10]
DECARVALHO FAT, 1994, STUDIES CLASSIFICATI, V5, P387