Web page classification based on a support vector machine using a weighted vote schema

被引:95
作者
Chen, Rung-Ching [1 ]
Hsieh, Chung-Hsun [1 ]
机构
[1] Chaoyang Univ Technol, Dept Informat Management, Taichung, Taiwan
关键词
latent semantic analysis; support vector machine; web page classification; feature extraction;
D O I
10.1016/j.eswa.2005.09.079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional information retrieval method use keywords occurring in documents to determine the class of the documents, but usually retrieves unrelated web pages. In order to effectively classify web pages solving the synonymous keyword problem, we propose a web page classification based on support vector machine using a weighted vote schema for various features. The system uses both latent semantic analysis and web page feature selection training and recognition by the SVM model. Latent semantic analysis is used to find the semantic relations between keywords, and between documents. The latent semantic analysis method projects terms and a document into a vector space to find latent information in the document. At the same time, we also extract text features from web page content. Through text features, web pages are classified into a suitable category. These two features are sent to the SVM for training and testing respectively. Based on the output of the SVM, a voting schema is used to determine the category of the web page. Experimental results indicate our method is more effective than traditional methods. (C) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:427 / 435
页数:9
相关论文
共 14 条
[1]  
APTE C, 1998, P AUT LEARN DISC C C
[2]  
*CKIP, 1999, CHIN WORDS SEGM PROG
[3]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[4]  
Gunn S. R., 1998, SUPPORT VECTOR MACHI
[5]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[6]  
JOHN MP, 2000, P ECDL2000 WORKSH SE
[7]   Text categorization based on k-nearest neighbor approach for Web site classification [J].
Kwon, OW ;
Lee, JH .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :25-44
[8]  
McCallum Andrew, 1998, AAAI 1998
[9]  
Mitchell TM., 1997, MACH LEARN, V1
[10]  
Quinlan J. R., 2014, C4 5 PROGRAMS MACHIN