Growing and navigating the small world Web by local content

被引:54
作者
Menczer, F [1 ]
机构
[1] Univ Iowa, Dept Management Sci, Iowa City, IA 52242 USA
关键词
D O I
10.1073/pnas.212348399
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Can we model the scale-free distribution of Web hypertext degree under realistic assumptions about the behavior of page authors? Can a Web crawler efficiently locate an unknown relevant page? These questions are receiving much attention due to their potential impact for understanding the structure of the Web and for building better search engines. Here I investigate the connection between the linkage and content topology of Web pages. The relationship between a text-induced distance metric and a link-based neighborhood probability distribution displays a phase transition between a region where linkage is not determined by content and one where linkage decays according to a power law. This relationship is used to propose a Web growth model that is shown to accurately predict the distribution of Web page degree, based on textual content and assuming only local knowledge of degree for existing pages. A qualitatively similar phase transition is found between linkage and semantic distance, with an exponential decay tail. Both relationships suggest that efficient paths can be discovered by decentralized Web navigation algorithms based on textual and/or categorical cues.
引用
收藏
页码:14014 / 14019
页数:6
相关论文
共 34 条
[1]  
Achlioptas D, 2001, ANN IEEE SYMP FOUND, P500
[2]   Search in power-law networks [J].
Adamic, L.A. ;
Lukose, R.M. ;
Puniyani, A.R. ;
Huberman, B.A. .
Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 2001, 64 (4 II) :461351-461358
[3]   Power-Law distribution of the World Wide Web [J].
Adamic, LA ;
Huberman, BA ;
Barabási, AL ;
Albert, R ;
Jeong, H ;
Bianconi, G .
SCIENCE, 2000, 287 (5461)
[4]   Internet -: Diameter of the World-Wide Web [J].
Albert, R ;
Jeong, H ;
Barabási, AL .
NATURE, 1999, 401 (6749) :130-131
[5]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[6]   Bose-Einstein condensation in complex networks [J].
Bianconi, G ;
Barabási, AL .
PHYSICAL REVIEW LETTERS, 2001, 86 (24) :5632-5635
[7]  
Bollobas B., 2001, Random Graphs, V21
[8]   Graph structure in the Web [J].
Broder, A ;
Kumar, R ;
Maghoul, F ;
Raghavan, P ;
Rajagopalan, S ;
Stata, R ;
Tomkins, A ;
Wiener, J .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6) :309-320
[9]   Focused crawling: a new approach to topic-specific Web resource discovery [J].
Chakrabarti, S ;
van den Berg, M ;
Dom, B .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16) :1623-1640
[10]  
Chakrabarti Soumen, 2002, P 11 INT C WORLD WID, P148, DOI DOI 10.1145/511446.511466