A document classification method by using field association words

被引:49
作者
Fuketa, M [1 ]
Lee, S [1 ]
Tsuji, T [1 ]
Okada, M [1 ]
Aoe, J [1 ]
机构
[1] Univ Tokushima, Dept Informat Sci & Intelligent Syst, Tokushima 7708506, Japan
关键词
D O I
10.1016/S0020-0255(00)00042-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although there is much research of text classification based on vector spaces using word information in the whole text, generally humans can recognize the field by finding the specific words. In this paper, such words are called a field association (FA) word that can be directly related to the field classification. This paper presents an early field understanding method using FA words. Five criteria of FA words are defined for hierarchical fields and a method of extracting shorter FA words of compound words is proposed. The presented approach is estimated by the simulation results of 140 fields' text files. (C) 2000 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:57 / 70
页数:14
相关论文
共 14 条
[1]   MODELS FOR RETRIEVAL WITH PROBABILISTIC INDEXING [J].
FUHR, N .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (01) :55-72
[2]   A fast method of determining weighted compound keywords from text databases [J].
Fuketa, M ;
Mizofuchi, S ;
Hayashi, Y ;
Aoe, JI .
INFORMATION PROCESSING & MANAGEMENT, 1998, 34 (04) :431-442
[3]  
Fukumoto F., 1996, P 16 INT C COMP LING, P406
[4]  
KAWABE K, 1998, INFORMATION PROCESSI, P87
[5]  
KIMOTO H, 1991, IEICE JAPAN D, V74, P556
[6]  
Kupiec J., 1995, SIGIR FOR ACM SPEC I, P68, DOI DOI 10.1145/215206.215333
[7]  
MAMIKI T, 1985, NEW ENGLISH GRAMMAR, V2
[8]  
Miyazaki M., 1984, Transactions of the Information Processing Society of Japan, V25, P970
[9]  
Miyazaki M., 1993, Transactions of the Information Processing Society of Japan, V34, P743
[10]   CONSTRUCTING LITERATURE ABSTRACTS BY COMPUTER - TECHNIQUES AND PROSPECTS [J].
PAICE, CD .
INFORMATION PROCESSING & MANAGEMENT, 1990, 26 (01) :171-186