A unified probabilistic framework for web page scoring systems

被引:41
作者
Diligenti, M [1 ]
Gori, M [1 ]
Maggini, M [1 ]
机构
[1] Univ Siena, Dipartimento Ingn Informaz, I-53100 Siena, Italy
关键词
Web page scoring systems; random walks; HITS; PageRank; focused PageRank;
D O I
10.1109/TKDE.2004.1264818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The definition of efficient page ranking algorithms is becoming an important issue in the design of the query interface of Web search engines. Information flooding is a common experience especially when broad topic queries are issued. Queries containing only one or two keywords usually match a huge number of documents, while users can only afford to visit the first positions of the returned list, which do not necessarily refer to the most appropriate answers. Some successful approaches to page ranking in a hyperlinked environment, like the Web, are based on link analysis. In this paper, we propose a general probabilistic framework for Web Page Scoring Systems (WPSS), which incorporates and extends many of the relevant models proposed in the literature. In particular, we introduce scoring systems for both generic (horizontal) and focused (vertical) search engines. Whereas horizontal scoring algorithms are only based on the topology of the Web graph, vertical ranking also takes the page contents into account and are the base for focused and user adapted search interfaces. Experimental results are reported to show the properties of some of the proposed scoring systems with special emphasis on vertical search.
引用
收藏
页码:4 / 16
页数:13
相关论文
共 23 条
[1]  
Amento B., 2000, SIGIR Forum, V34, P296, DOI 10.1145/345508.345603
[2]  
[Anonymous], 2000, ICML
[3]  
[Anonymous], 1998, Proceedings of the 7th international conference on World Wide Web (WWW), DOI [10.1016/S0169-7552(98)00110-X, DOI 10.1016/S0169-7552(98)00110-X]
[4]  
[Anonymous], 2000, NIPS
[5]  
Bharat K., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P104, DOI 10.1145/290941.290972
[6]   Automatic resource compilation by analyzing hyperlink structure and associated text [J].
Chakrabarti, S ;
Dom, B ;
Raghava, P ;
Rajagopalan, S ;
Gibson, D ;
Kleinberg, J .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :65-74
[7]  
Chakrabarti S, 1999, PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, P545
[8]  
CHAKRABARTI S, 2001, P 24 ANN INT ACM SIG, P208
[9]  
Diligenti M, 2002, P 11 INT C WORLD WID, P508
[10]  
DILIGENTI M, 2000, P 26 INT C VER LARG, P527