基于概念的网页相似度处理算法研究

被引：8

作者：

郭晨娟

李战怀

机构：

[1] 西北工业大学计算机学院

来源：

计算机应用 | 2006年 / 12期

关键词：

相似网页; 概念抽取; 聚类分析; 消重;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

081203 ; 0835 ;

摘要：

针对海量网页信息,提出适于搜索引擎使用的网页相似度处理算法。算法依据网页抽象形成的概念,在倒排文档基础上建立相似度处理模型。该模型缩小了需要进行相似度计算的网页文档范围,节约大量时间和空间资源,为优化相似度计算奠定了良好基础。

引用

页码：3030 / 3032

页数：3

共 8 条

[1]

Binary cluster division and its applica-tion to a modified single pass clustering algorithm. ETZWEILER L,,MARTIN C. Report No.ISR-21to the National Library of Medicine . 1972

[2]

Discovering informative content blocks from Web documents. SHIAN-HUA LIN,,JAN-MING HO. Proceedings of the SIGKDD Con-ference . 2002

[3]

Data Mining:Concepts and Techniques. HAN JW,,KAMBER M. . 1998

[4]

Introduction to Modern Information Retriev-al. SALTON G,,MCGILLMJ. . 1983

[5]

Noise reduction in a statistical approach to text categori-zation. YANG YM. Proceedings of SIGIR295,18th ACM International Con-ference on Research and Development in Information Retrieval . 1995

[6]

The Anatomy of a Large-Scale Hypertextual Web Search Engine. BRIN S,,PAGE L. Proceedings of the7th International World Wide Web Conference . 1998

[7]

Combining Multiple Evidence from Different Proper-ties of Weighting Schemes. JOON HO LEE. Proceeding of the18th annual inter-national ACMSIGIR conference on Research and development in in-formation retrieval . 1995

[8]

Automatic Text Processin-the Transformation,Analysis and Retrieval of Information by Computer. SALTONG. . 1989

← 1 →