Web网页识别中的特征选择问题研究

被引：29

作者：

朱明

王军

王俊普

机构：

[1] 中国科学技术大学自动化系!合肥

来源：

计算机工程 | 2000年 / 08期

基金：

安徽省自然科学基金;

关键词：

特权选择; 网页分类; 决策树; 机器学习;

D O I：

暂无

中图分类号：

TP393 [计算机网络];

学科分类号：

081201 ; 1201 ;

摘要：

对Ｗｅｂ网页识别中有关特征选择的两个重要问题进行了较为深入的探讨．提出了一种新的描述特征选择方法，并将其与３种已有的描述特征选择方法进行实验比较，证实其有效性；此外还对５种在文本归类中，具有代表性的识别特征选择方法在Ｗｅｂ网页识别中的实际应用效果进行了评估比较，并发现信息增益和统计方法，选择识别特征效果最佳．

引用

页码：35 / 37

页数：3

共 4 条

[1] A Comparison of Two LearningAlgorithm for Text Categorization.The Third Annual Symposiumon Document Analysis and Information Retrival.Las Vegas. Lewis D D,Riguette M. NV . 1994
[2] Learning to ExtractSymbolic Knowledge from the World Wide Web. Craven M,Dipasquo D,Freitag D, et al. TechnicalReport. CMU-CS -98-122. School of Computer Science.CarnegieMellon University . 1998
[3] A Sequential Algorithm for Training TextClassifiers.SIGIR 94:Proceedings of 17th Annual InternationalACM-SIGIR Conference on Research and Development in Information Retrival, Springer-Verlag. Lewis D D,Gale W A. London Journal . 1994
[4] A comparative Study on Feature Selection inText Categorization. Yiming Yang,Jan O Pederson. Proceeding of the Fourteenth Internatio nal Conference on Machine Learning . 1997

← 1 →