Web网站死链检测方法

被引:2
作者
姚卓
蔡皖东
姚烨
机构
[1] 西北工业大学计算机学院
关键词
网站; 死链检测; HTTP协议; Web链接;
D O I
10.19304/j.cnki.issn1000-7180.2012.12.024
中图分类号
TP393.092 [];
学科分类号
080402 ;
摘要
网站作为大规模的信息集合体,包含了大量的Web链接.有些Web链接经过一段时间之后,因种种原因而失效或者出现错误,从而形成死链.本文提出一种Web网站死链检测方法.根据Web链接的调度过程,自动获取网站链接信息;根据Web链接的结构特点和网页检索操作,对死链进行分析和检测;针对链接的相互引用问题和用户体验与页面深度的关系,对采集的数据进行预处理.实验结果表明,该方法能有效地提高死链的检测覆盖率和处理效率.
引用
收藏
页码:103 / 107+111 +111
页数:6
相关论文
共 9 条
[1]  
Evaluating methods to rediscov-er missing web pages from the web infrastructure. Klein M,Nelson M L. Proc of JCDL . 2010
[2]  
Research on Prototype Framework of a Multi-threading Web Crawler for E-commerce. Wenqing yuan. Management and Service Science(MASS) . 2009
[3]  
DSNotify-detecting and?xing broken links in linked data sets. Haslhofer B,Popitsch N. DEXA09.20th International Colocated with DEXA . 2009
[4]  
Dsnotify:handling broken links in the web of data. Popitsch N,Haslhofer B. Proc of WWW . 2010
[5]  
DSNotify-A solu-tion for event detection and link maintenance in dy-namic datasets. Niko Popitsch,Bernhard Haslhofer. J Web Sem . 2011
[6]  
Just-in-time recovery of missing web pages. Harrison T J,Nelson M L. Proc of HYPERTEXT . 2006
[7]  
K-Divided Bloom Filter Algorithm and Its Analysis. Xiao-Guang Liu,Jun Lee,Gang Wang,Guang-jun Xie,Jing Liu. Proceedings of Future Generation Communication and Networking (FGCN2007) . 2007
[8]  
Preserving linked data on the semantic web by the application of link integrity techniques from hypermedia. Vesse R,Hall W,Carr L. LDOW2010,Colo-cated with WWW’’10 . 2010
[9]  
Bringing your dead links back to life:a comprehensive approach and lessons learned. Morishima A,Nakamizo A,Iida T,et al. Proc of HT . 2009