Ranking pages by topology and popularity within web sites

被引:9
作者
Borges, Jose
Levene, Mark
机构
[1] Univ London Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England
[2] Univ Porto, Sch Engn, P-4200 Oporto, Portugal
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2006年 / 9卷 / 03期
关键词
web data mining; web usage mining; Page Rank; Popularity Rank; Site Rank;
D O I
10.1007/s11280-006-8558-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We compare two link analysis ranking methods of web pages in a site. The first, called Site Rank, is an adaptation of PageRank to the granularity of a web site and the second, called Popularity Rank, is based on the frequencies of user clicks on the outlinks in a page that are captured by navigation sessions of users through the web site. We ran experiments on artificially created web sites of different sizes and on two real data sets, employing the relative entropy to compare the distributions of the two ranking methods. For the real data sets we also employ a nonparametric measure, called Spearman's footrule, which we use to compare the top-ten web pages ranked by the two methods. Our main result is that the distributions of the Popularity Rank and Site Rank are surprisingly close to each other, implying that the topology of a web site is very instrumental in guiding users through the site. Thus, in practice, the Site Rank provides a reasonable first order approximation of the aggregate behaviour of users within a web site given by the Popularity Rank.
引用
收藏
页码:301 / 316
页数:16
相关论文
共 21 条
[1]  
Adamic IA, 2001, COMMUN ACM, V44, P55, DOI 10.1145/383694.383707
[2]  
[Anonymous], PRACTICAL HDB INTERN
[3]  
[Anonymous], 1957, MATH FDN INFORM THEO
[4]  
[Anonymous], 1998, Proceedings of the 7th international conference on World Wide Web (WWW), DOI [10.1016/S0169-7552(98)00110-X, DOI 10.1016/S0169-7552(98)00110-X]
[5]  
[Anonymous], 2001, WORKSHOP WEB MINING
[6]  
Bianchini M., 2005, ACM Transactions on Internet Technology, V5, P92, DOI 10.1145/1052934.1052938
[7]  
Borges J, 2000, LECT NOTES COMPUT SC, V1836, P92
[8]  
BORGES J, 2000, SIGKDD EXPLORATIONS, V2, P40
[9]   Graph structure in the Web [J].
Broder, A ;
Kumar, R ;
Maghoul, F ;
Raghavan, P ;
Rajagopalan, S ;
Stata, R ;
Tomkins, A ;
Wiener, J .
COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 2000, 33 (1-6) :309-320
[10]   Comparing top k lists [J].
Fagin, R ;
Kumar, R ;
Sivakumar, D .
SIAM JOURNAL ON DISCRETE MATHEMATICS, 2003, 17 (01) :134-160