Classification of web documents using graph matching

被引:49
作者
Schenker, A
Last, M
Bunke, H
Kandel, A
机构
[1] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[2] Univ Bern, Dept Comp Sci, Inst Informat & Angew Math, CH-3012 Bern, Switzerland
[3] Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel
关键词
graph representation; graph matching; document classification; k-nearest neighbors algorithm;
D O I
10.1142/S0218001404003241
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a classification method that allows the use of graph-based representations of data instead of traditional vector-based representations. We compare the vector approach combined with the k-Nearest Neighbor (k-NN) algorithm to the graph-matching approach when classifying three different web document collections, using the leave-one-out approach for measuring classification accuracy. We also compare the performance of different graph distance measures as well as various document representations that utilize graphs. The results show the graph-based approach can outperform traditional vector-based methods in terms of accuracy, dimensionality and execution time.
引用
收藏
页码:475 / 496
页数:22
相关论文
共 18 条
[1]  
[Anonymous], 2003, WEB DOCUMENT ANAL CH
[2]  
[Anonymous], P 1 INT WORKSH WEB D
[3]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[4]   A graph distance metric based on the maximal common subgraph [J].
Bunke, H ;
Shearer, K .
PATTERN RECOGNITION LETTERS, 1998, 19 (3-4) :255-259
[5]   On a relation between graph edit distance and maximum common subgraph [J].
Bunke, H .
PATTERN RECOGNITION LETTERS, 1997, 18 (08) :689-694
[6]   Nursing theory-guided models for teaching-learning [J].
Bunkers, SS .
NURSING SCIENCE QUARTERLY, 2002, 15 (02) :117-117
[7]  
CROCHEMORE M, 1997, CPM97
[8]   A graph distance metric combining maximum common subgraph and minimum common supergraph [J].
Fernández, ML ;
Valiente, G .
PATTERN RECOGNITION LETTERS, 2001, 22 (6-7) :753-758
[9]  
Lazarescu M, 2000, LECT NOTES COMPUT SC, V1876, P236
[10]  
Liang J, 2002, LECT NOTES COMPUT SC, V2423, P224