Taxonomy and clustering in collaborative systems: The case of the on-line encyclopedia Wikipedia

被引:23
作者
Capocci, A. [1 ]
Rao, F. [2 ]
Caldarelli, G. [2 ,3 ,4 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat & Sistemist, I-00185 Rome, Italy
[2] Ctr Ric & Museo Fis E Fermi, I-00185 Rome, Italy
[3] Univ Roma La Sapienza, Dipartimento Fis, INFM, CNR,SMC Ctr, I-00185 Rome, Italy
[4] Ctr Study Complex Networks, Linkalab, I-09100 Cagliari, Italy
关键词
D O I
10.1209/0295-5075/81/28006
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method. Copyright (C) EPLA, 2008.
引用
收藏
页数:5
相关论文
共 35 条
  • [1] Power-Law distribution of the World Wide Web
    Adamic, LA
    Huberman, BA
    Barabási, AL
    Albert, R
    Jeong, H
    Bianconi, G
    [J]. SCIENCE, 2000, 287 (5461)
  • [2] Statistical mechanics of complex networks
    Albert, R
    Barabási, AL
    [J]. REVIEWS OF MODERN PHYSICS, 2002, 74 (01) : 47 - 97
  • [3] Complex networks: Structure and dynamics
    Boccaletti, S.
    Latora, V.
    Moreno, Y.
    Chavez, M.
    Hwang, D. -U.
    [J]. PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2006, 424 (4-5): : 175 - 308
  • [4] Graph structure in the Web
    Broder, A
    Kumar, R
    Maghoul, F
    Raghavan, P
    Rajagopalan, S
    Stata, R
    Tomkins, A
    Wiener, J
    [J]. COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6): : 309 - 320
  • [5] THE FRACTAL GEOMETRY OF EVOLUTION
    BURLANDO, B
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1993, 163 (02) : 161 - 172
  • [6] THE FRACTAL DIMENSION OF TAXONOMIC SYSTEMS
    BURLANDO, B
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1990, 146 (01) : 99 - 114
  • [7] Widespread occurrence of the inverse square distribution in social sciences and taxonomy
    Caldarelli, G
    Cartozo, CC
    De Los Rios, P
    Servedio, VDP
    [J]. PHYSICAL REVIEW E, 2004, 69 (03): : 035101 - 1
  • [8] Caldarelli G., 2007, Scale-free networks: complex webs in nature and technology
  • [9] A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment
    Campello, R. J. G. B.
    [J]. PATTERN RECOGNITION LETTERS, 2007, 28 (07) : 833 - 841
  • [10] Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia
    Capocci, A.
    Servedio, V. D. P.
    Colaiori, F.
    Buriol, L. S.
    Donato, D.
    Leonardi, S.
    Caldarelli, G.
    [J]. PHYSICAL REVIEW E, 2006, 74 (03)