基于概念的网页相似度处理算法研究

被引:8
作者
郭晨娟
李战怀
机构
[1] 西北工业大学计算机学院
关键词
相似网页; 概念抽取; 聚类分析; 消重;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
针对海量网页信息,提出适于搜索引擎使用的网页相似度处理算法。算法依据网页抽象形成的概念,在倒排文档基础上建立相似度处理模型。该模型缩小了需要进行相似度计算的网页文档范围,节约大量时间和空间资源,为优化相似度计算奠定了良好基础。
引用
收藏
页码:3030 / 3032
页数:3
相关论文
共 8 条
[1]  
Binary cluster division and its applica-tion to a modified single pass clustering algorithm. ETZWEILER L,,MARTIN C. Report No.ISR-21to the National Library of Medicine . 1972
[2]  
Discovering informative content blocks from Web documents. SHIAN-HUA LIN,,JAN-MING HO. Proceedings of the SIGKDD Con-ference . 2002
[3]  
Data Mining:Concepts and Techniques. HAN JW,,KAMBER M. . 1998
[4]  
Introduction to Modern Information Retriev-al. SALTON G,,MCGILLMJ. . 1983
[5]  
Noise reduction in a statistical approach to text categori-zation. YANG YM. Proceedings of SIGIR295,18th ACM International Con-ference on Research and Development in Information Retrieval . 1995
[6]  
The Anatomy of a Large-Scale Hypertextual Web Search Engine. BRIN S,,PAGE L. Proceedings of the7th International World Wide Web Conference . 1998
[7]  
Combining Multiple Evidence from Different Proper-ties of Weighting Schemes. JOON HO LEE. Proceeding of the18th annual inter-national ACMSIGIR conference on Research and development in in-formation retrieval . 1995
[8]  
Automatic Text Processin-the Transformation,Analysis and Retrieval of Information by Computer. SALTONG. . 1989