A framework to compute page importance based on user behaviors

被引:29
作者
Liu, Yuting [1 ]
Liu, Tie-Yan [2 ]
Gao, Bin [2 ]
Ma, Zhiming [3 ]
Li, Hang [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Sci, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] Chinese Acad Sci, Acad Math & Syst Sci, Beijing, Peoples R China
来源
INFORMATION RETRIEVAL | 2010年 / 13卷 / 01期
关键词
User browsing process; Continuous-time time-homogeneous Markov process; Staying time; BrowseRank;
D O I
10.1007/s10791-009-9098-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
This paper is concerned with a framework to compute the importance of webpages by using real browsing behaviors of Web users. In contrast, many previous approaches like PageRank compute page importance through the use of the hyperlink graph of the Web. Recently, people have realized that the hyperlink graph is incomplete and inaccurate as a data source for determining page importance, and proposed using the real behaviors of Web users instead. In this paper, we propose a formal framework to compute page importance from user behavior data (which covers some previous works as special cases). First, we use a stochastic process to model the browsing behaviors of Web users. According to the analysis on hundreds of millions of real records of user behaviors, we justify that the process is actually a continuous-time time-homogeneous Markov process, and its stationary probability distribution can be used as the measure of page importance. Second, we propose a number of ways to estimate parameters of the stochastic process from real data, which result in a group of algorithms for page importance computation (all referred to as BrowseRank). Our experimental results have shown that the proposed algorithms can outperform the baseline methods such as PageRank and TrustRank in several tasks, demonstrating the advantage of using our proposed framework.
引用
收藏
页码:22 / 45
页数:24
相关论文
共 32 条
[1]
Amento B., 2000, SIGIR Forum, V34, P296, DOI 10.1145/345508.345603
[2]
Anderson W., 1991, CONTINUOUS TIME MARK, DOI 10.1007/978-1-4612-3038-0
[3]
[Anonymous], 2003, CONVEX OPTIMIZATION
[4]
[Anonymous], 2003, 200320 STANF INFOLAB
[5]
[Anonymous], 1999, TECH REPORT STANFORD
[6]
[Anonymous], 1998, Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, DOI DOI 10.5555/314613.315045
[7]
[Anonymous], 2002, Proceedings of the 11th international conference on World Wide Web, DOI DOI 10.1145/511446.511513
[8]
[Anonymous], 1994, Introduction to the Numerical Solutions of Markov Chains
[9]
[Anonymous], 2004, INTERNET MATH, DOI DOI 10.1080/15427951.2004.10129091
[10]
[Anonymous], 1999, 199931 STANF INFOLAB