Modelling the characteristics of Web page outlinks

被引:8
作者
Ajiferuke, I [1 ]
Wolfram, D
机构
[1] Univ Western Ontario, Fac Informat & Media Studies, London, ON N6A 5B7, Canada
[2] Univ Wisconsin, Sch Informat Studies, Milwaukee, WI 53201 USA
关键词
D O I
10.1023/B:SCIE.0000013298.22207.2b
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Using data sampled from top-level Web pages across five high-level domains and from sample pages within individual websites, the authors investigate the frequency distribution of outlinks in Web pages. The observed distributions were fitted to different theoretical distributions to determine the best-fitting model for representing outlink frequency across Web pages. Theoretical models tested include the modified power law (MPL), Mandelbrot (MDB), generalized Waring (GW), generalized inverse Gaussian-Poisson (GIGP), and generalized negative binomial (GNB) distributions. The GIGP and GNB provided good fits for data sets for top-level pages across the high level domains tested, with the GIGP performing slightly better. The lumpiness and bimodal nature of two of the observed outlink distributions from Web pages within a given website resulted in poor fits of the theoretical models. The GIGP was able to provide better fits to these data sets after the top components were truncated. The ability to effectively model Web page attributes, such as the distribution of the number of outlinks per page, paves the way for simulation models of Web page structural content, and makes it possible to estimate the number of outlinks that may be encountered within Web pages of a specific domain or within individual websites.
引用
收藏
页码:43 / 62
页数:20
相关论文
共 39 条
  • [1] Adamic IA, 2001, COMMUN ACM, V44, P55, DOI 10.1145/383694.383707
  • [2] AJIFERUKE I, UNPUB ANAL IMAGE TAG
  • [3] Topology of evolving networks:: Local events and universality
    Albert, R
    Barabási, AL
    [J]. PHYSICAL REVIEW LETTERS, 2000, 85 (24) : 5234 - 5237
  • [4] Internet -: Diameter of the World-Wide Web
    Albert, R
    Jeong, H
    Barabási, AL
    [J]. NATURE, 1999, 401 (6749) : 130 - 131
  • [5] [Anonymous], WWW INTERNET MODELS
  • [6] [Anonymous], 1997, CYBERMETRICS
  • [7] [Anonymous], 1949, Human behaviour and the principle of least-effort
  • [8] BAAYEN RH, 2001, WORLD FREQUENCY DIST
  • [9] Barford P., 1998, Performance Evaluation Review, V26, P151, DOI 10.1145/277858.277897
  • [10] An exploratory profile of personal home pages: Content, design, metaphors
    Bates, MJ
    Lu, SJ
    [J]. ONLINE & CDROM REVIEW, 1997, 21 (06): : 331 - 340