Combining text and link analysis for focused crawling - An application for vertical search engines

被引:50
作者
Almpanidis, G. [1 ]
Kotropoulos, C. [1 ]
Pitas, I. [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, GR-54124 Thessaloniki, Greece
关键词
focused crawling; information retrieval; latent semantic indexing; text categorisation; vertical search engines; WEB; ALGORITHM; MODEL;
D O I
10.1016/j.is.2006.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler self-evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents. Our implementation presents a different approach to focused crawling and aims to overcome the limitations imposed by the need to provide initial data for training, while maintaining a high recall/precision ratio. We compare its efficiency with other well-known web information retrieval techniques. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:886 / 908
页数:23
相关论文
共 66 条
[1]  
Aggarwal CharuC., 2001, P 10 INT WORLD WIDE, P96, DOI [DOI 10.1145/371920.371955, 10.1145/371920.371955]
[2]  
[Anonymous], THESIS U TENNESSEE K
[3]  
[Anonymous], 2017, INT
[4]  
[Anonymous], P 28 ACM SIGIR C RES
[5]  
[Anonymous], P 10 TEXT RER C TREC
[6]  
[Anonymous], ACM T INTERNET TECHN
[7]  
Baeza-Yates R., 1999, Modern Information Retrieval, V463
[8]  
Baeza-Yates Ricardo A., 2005, Special interest tracks and posters of the 14th international conference on World Wide Web, P864, DOI DOI 10.1145/1062745.1062768
[9]  
BERGMARK D, 2002, P 6 EUR C RES ADV TE, P91
[10]  
Berry M.W., 1999, UNDERSTANDING SEARCH