搜索引擎中Robot搜索算法的优化

被引：19

作者：

宋聚平

王永成

滕伟

许欢庆

机构：

[1] 上海交通大学电子信息学院

[2] 上海交通大学电子信息学院上海

[3] 上海

来源：

情报学报 | 2002年 / 02期

关键词：

搜索引擎; 超链接; Robot; PageRank;

D O I：

暂无

中图分类号：

TP393.09 [];

学科分类号：

080402 ;

摘要：

目前的搜索引擎越来越暴露出不足之处 ,当用户使用搜索引擎时输入特定关键词之后 ,返回的查询结果往往有数千甚至几百万之多 ,而且其中包含大量的重复信息与垃圾信息 ,用户从中筛选出自己感兴趣的网页仍然需要耗费很长的时间。另外一种情况就是 ,Web上明明存在某些重要网页 ,却没有被搜索引擎的robot发现。本文针对这种现象 ,重点讨论搜索引擎中的搜索策略 ,改善搜索算法 ,使Robot在搜索阶段就能够充分处理与Robot频繁交互的URL列表。根据网页的内容、HTML结构以及其中包含的超链信息计算网页的PageRank ,使URL列表能够根据重要性调整排列顺序。初步的试验结果表明 ,本文的优化算法可以较大程度地改进搜索引擎的整体性能

引用

页码：130 / 133

页数：4

共 8 条

[1]

Experiments of collecting www information using distributed www robots. H. Yamana,K. Tamur,H. Kawano,S. Kamei,M. Harada,etc. . 1998

[2]

Silk from a sow’s ear: Extracting usable structures from the Web. P. Pirolli,J. Pitkow and R. Rao. . 1996

[3]

An efficient algorithm to rank the Web source. Dell Zhang,Yisheng Dong. Computer Networks . 2000

[4]

A neural network-based intelligent metasearch engine. Bo Shu,subhash Kat. Journal of Information Science . 1999

[5]

Overmeer.My personal search engine. Mark A. C. Computer Networks . 1999

[6]

Harvest: a scalable, customizable discovery and access system. C. M. Bowman,P. B. Danzig,D. R. Hardy,U. Manber,and M.F. Schwartz. . 1994

[7]

Arching the Internet, extended version of the article"Preserving the Internet". B. Kahle. . 1997

[8]

Accessibility of information on the Web. S. Lawrence,C. Lee Giles. Nature . 1999

← 1 →