Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques

被引:24
作者
Egghe, L
Michel, C
机构
[1] LUC, B-3590 Diepenbeek, Belgium
[2] UIA, B-2610 Antwerp, Wilrijk, Belgium
[3] DU Bordeaux 3, MSHA, CEM GRESIC, F-33607 Pessac, France
关键词
similarity measure; ordered set; fuzzy;
D O I
10.1016/S0306-4573(02)00027-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Ordered sets of documents are encountered more and more in information distribution systems, such as information retrieval systems. Classical similarity measures for ordinary sets of documents hence need to be extended to these ordered sets. This is done in this paper using fuzzy set techniques. First a general similarity measure is developed which contains the classical strong similarity measures such as Jaccard, Dice, Cosine and which contains the classical weak similarity measures such as Recall and Precision. Then these measures are extended to comparing fuzzy sets of documents. Measuring the similarity for ordered sets of documents is a special case of this, where, the higher the rank of a document, the lower its weight is in the fuzzy set. Concrete forms of these similarity measures are presented. All these measures are new and the ones for the weak similarity measures are the first of this kind (other strong similarity measures have been given in a previous paper by Egghe and Michel). Some of these measures are then tested in the IR-system Profil-Doc. The engine SPIRIT(C) extracts ranked documents sets in three different contexts, each for 600 request. The practical useability of the OS-measures is then discussed based on these experiments. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:771 / 807
页数:37
相关论文
共 15 条
[1]  
[Anonymous], 1975, SOLIDS, DOI DOI 10.1016/B978-0
[2]  
BOYCE BR, 1995, MEASUREMENT INFORMAT
[3]  
BUELL DA, 1981, P AM SOC INFORM SCI, V18, P298
[4]  
BUELL DA, 1981, P ACM SIGIR 81, P56
[5]   Strong similarity measures for ordered sets of documents in information retrieval [J].
Egghe, L ;
Michel, C .
INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (06) :823-848
[6]   A THEORY OF CONTINUOUS RATES AND APPLICATIONS TO THE THEORY OF GROWTH AND OBSOLESCENCE RATES [J].
EGGHE, L .
INFORMATION PROCESSING & MANAGEMENT, 1994, 30 (02) :279-292
[7]  
Egghe L., 1990, Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science
[8]  
FLUHR C, 1997, P INET 97 7 ANN C IN
[9]  
GROSSMAN D, 1998, INFORMATION RETRIEVA
[10]   Improving information retrieval by combining user profile and document segmentation [J].
LaineCruzel, S ;
Lafouge, T ;
Lardy, JP ;
BenAbdallah, N .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (03) :305-315