Semantic ranking of web pages based on formal concept analysis

被引:30
作者
Du, Yajun [1 ]
Hai, YuFeng [1 ]
机构
[1] Xihua Univ, Sch Math & Comp Sci, Chengdu 610039, Sichuan, Peoples R China
关键词
Web crawler; Crawling direction; Search engine; Formal concept analysis; CONCEPT SIMILARITY; CONTEXT GRAPH; INFORMATION; CRAWLER; PERFORMANCE; KNOWLEDGE; SYSTEMS; MODELS;
D O I
10.1016/j.jss.2012.07.040
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A web crawler is an important research component in a search engine. In this paper, a new method for measuring the similarity of formal concept analysis (FCA) concepts and a new notion of a web page's rank are proposed that use an information content approach based on users' web logs. First, an extension similarity and an intension similarity that analyze a user's browsing pattern and their hyperlinks are proposed. Second, the information content similarity between two nouns is computed automatically by examining their ISA and Part-Of hierarchy and using a user's web log. A method for computing the semantic similarity between two concepts in two different concept lattices (the base concept lattice and the current concept lattice) and finding the semantic ranking of web pages is proposed. Last, our experiment demonstrates that our crawler is more suitable for crawling focused web pages. It proves that the semantic ranking of web pages is useful and efficient for making a web crawler's choice of a web page for continuing work. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:187 / 197
页数:11
相关论文
共 37 条
[1]  
Allan B., 2001, P 10 INT WORLD WID W, P415
[2]   Combining text and link analysis for focused crawling - An application for vertical search engines [J].
Almpanidis, G. ;
Kotropoulos, C. ;
Pitas, I. .
INFORMATION SYSTEMS, 2007, 32 (06) :886-908
[3]  
[Anonymous], P 15 INT C MACH LEAR
[4]  
[Anonymous], ORDERED SETSAND I RI
[5]  
Bain M, 2003, LECT NOTES ARTIF INT, V2903, P88
[6]   Improving the performance of focused web crawlers [J].
Batsakis, Sotiris ;
Petrakis, Euripides G. M. ;
Milios, Evangelos .
DATA & KNOWLEDGE ENGINEERING, 2009, 68 (10) :1001-1013
[7]   A subjective measure of web search quality [J].
Beg, MMS .
INFORMATION SCIENCES, 2005, 169 (3-4) :365-381
[8]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[9]   Efficient crawling through URL ordering [J].
Cho, J ;
Garcia-Molina, H ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :161-172
[10]  
COHN D, 2000, P 17 INT C MACH LEAR