New Relations Between Similarity Measures for Vectors Based on Vector Norms

被引:20
作者
Egghe, Leo [1 ,2 ]
机构
[1] Univ Hasselt, B-3590 Diepenbeek, Belgium
[2] Univ Antwerp, IBW, B-2000 Antwerp, Belgium
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2009年 / 60卷 / 02期
关键词
JACCARD INDEX; ORDERED SETS; DOCUMENTS; PRECISION; FALLOUT; RECALL; COSINE; MISS;
D O I
10.1002/asi.20949
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the "trajectories" of the form parallel to (X) over right arrow parallel to = a parallel to (Y) over right arrow parallel to, where a > 0 is a constant and parallel to center dot parallel to denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it Is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
引用
收藏
页码:232 / 239
页数:8
相关论文
共 17 条
[1]   Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient [J].
Ahlgren, P ;
Jarneving, B ;
Rousseau, R .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (06) :550-560
[2]  
[Anonymous], MEASUREMENT INFORM S
[3]  
Dominich S., 2001, Mathematical foundations of information retrieval
[4]   The measures precision, recall, fallout and miss as a function of the number of retrieved documents and their mutual interrelations [J].
Egghe, L. .
INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (02) :856-876
[5]   Existence theorem of the quadruple (P, R, F, M):: Precision, recall, fallout and miss [J].
Egghe, L. .
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (01) :265-272
[6]   Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques [J].
Egghe, L ;
Michel, C .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (05) :771-807
[7]   Strong similarity measures for ordered sets of documents in information retrieval [J].
Egghe, L ;
Michel, C .
INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (06) :823-848
[8]  
Egghe Leo, 1990, INTRO INFORM QUANTIT
[9]  
Grossman David., 1998, Information retrieval algorithms and heuristics
[10]   SIMILARITY MEASURES IN SCIENTOMETRIC RESEARCH - THE JACCARD INDEX VERSUS SALTON COSINE FORMULA [J].
HAMERS, L ;
HEMERYCK, Y ;
HERWEYERS, G ;
JANSSEN, M ;
KETERS, H ;
ROUSSEAU, R ;
VANHOUTTE, A .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) :315-318