Evolution of document networks

被引:40
作者
Menczer, F [1 ]
机构
[1] Indiana Univ, Sch Informat, Bloomington, IN 47408 USA
关键词
D O I
10.1073/pnas.0307554100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
How does a network of documents grow without centralized control? This question is becoming crucial as we try to explain the emergent scale-free topology of the World Wide Web and use link analysis to identify important information resources. Existing models of growing information networks have focused on the structure of links but neglected the content of nodes. Here I show that the current models fail to reproduce a critical characteristic of information networks, namely the distribution of textual similarity among linked documents. I propose a more realistic model that generates links by using both popularity and content. This model yields remarkably accurate predictions of both degree and similarity distributions in networks of web pages and scientific literature.
引用
收藏
页码:5261 / 5265
页数:5
相关论文
共 30 条
  • [1] Power-Law distribution of the World Wide Web
    Adamic, LA
    Huberman, BA
    Barabási, AL
    Albert, R
    Jeong, H
    Bianconi, G
    [J]. SCIENCE, 2000, 287 (5461)
  • [2] Internet -: Diameter of the World-Wide Web
    Albert, R
    Jeong, H
    Barabási, AL
    [J]. NATURE, 1999, 401 (6749) : 130 - 131
  • [3] ALDOUS D, 2003, ARXIVCONDMAT0304701
  • [4] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [5] Belew R., 2000, Finding out about: A cognitive perspective on search engine technology and the WWW
  • [6] The simultaneous evolution of author and paper networks
    Börner, K
    Maru, JT
    Goldstone, RL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 : 5266 - 5273
  • [7] Visualizing knowledge domains
    Börner, K
    Chen, CM
    Boyack, KW
    [J]. ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 : 179 - 255
  • [8] The anatomy of a large-scale hypertextual Web search engine
    Brin, S
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 107 - 117
  • [9] Graph structure in the Web
    Broder, A
    Kumar, R
    Maghoul, F
    Raghavan, P
    Rajagopalan, S
    Stata, R
    Tomkins, A
    Wiener, J
    [J]. COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6): : 309 - 320
  • [10] Cooper C., 2001, Algorithms - ESA 2001. 9th Annual European Symposium. Proceedings (Lecture Notes in Computer Science Vol.2161), P500