Automatic building of new Field Association word candidates using search engine

被引:28
作者
Atlam, ES [1 ]
Elmarhomy, G
Morita, K
Fuketa, M
Aoe, JI
机构
[1] Univ Tokushima, Dept Informat Sci & Intelligent Syst, Tokushima 7708506, Japan
[2] Tanta Univ, Dept Comp Sci & Stat, Tanta, Egypt
基金
日本学术振兴会;
关键词
Field Association words; WWW search engine; FA word dictionary; concentration ratio;
D O I
10.1016/j.ipm.2005.08.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:951 / 962
页数:12
相关论文
共 18 条
[1]  
Aoe J., 1989, T IPSJ, V39, P2563
[2]   A new method for selecting English field association terms of compound words and its knowledge representation [J].
Atlam, E ;
Morita, K ;
Fuketa, M ;
Aoe, J .
INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (06) :807-821
[3]   Documents similarity measurement using field association terms [J].
Atlam, ES ;
Fuketa, M ;
Morita, K ;
Aoe, J .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) :809-824
[4]  
ATLAM ES, 2004, NEW ALGORITHM AUTOMA
[5]  
ATLAM ES, 2004, 8 INT C KNOWL BAS 1, P530
[6]  
Breiman L., 1998, CLASSIFICATION REGRE
[7]  
DOZAWA T, 1999, INNOVATIVE MULTI INF
[8]   MODELS FOR RETRIEVAL WITH PROBABILISTIC INDEXING [J].
FUHR, N .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (01) :55-72
[9]  
Fukumoto F., 1996, P 16 INT C COMP LING, P406
[10]  
Iwayama M., 1999, J NATURAL LANGUAGE P, V6, P181