Improvement of building field association term dictionary using passage retrieval

被引:9
作者
Sharif, Uddin Md. [1 ]
Ghada, Elmarhomy [1 ]
Atlam, Elsayed [1 ]
Fuketa, Masao [1 ]
Morita, Kazuhiro [1 ]
Aoe, Jun-Ichi [1 ]
机构
[1] Univ Tokushima, Dept Informat Sci & Intelligent Syst, Tokushima 7708506, Japan
基金
日本学术振兴会;
关键词
field association terms; passage retrieval; WWW search engine; FA terms dictionary; recall; precision;
D O I
10.1016/j.ipm.2006.12.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Field Association (FA) terms are a limited set of discriminating terms that can specify document fields. Document fields can be decided efficiently if there are many relevant FA terms in that documents. An earlier approach built FA terms dictionary using a WWW search engine, but there were irrelevant selected FA terms in that dictionary because that approach extracted FA terms from the whole documents. This paper proposes a new approach for extracting FA terms using passage (portions of a document text) technique rather than extracting them from the whole documents. This approach extracts FA terms more accurately than the earlier approach. The proposed approach is evaluated for 38,372 articles from the large tagged corpus. According to experimental results, it turns out that by using the new approach about 24% more relevant FA terms are appending to the earlier FA term dictionary and around 32% irrelevant FA terms are deleted. Moreover, precision and recall are achieved 98% and 94% respectively using the new approach. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1793 / 1807
页数:15
相关论文
共 24 条
[11]  
Fukumoto F., 1996, P 16 INT C COMP LING, P406
[12]  
Hearst M. A., 1993, P 16 ANN INT ACM SIG, P59
[13]  
JIANG J, 2004, UIUC HARD 2004 PASSA
[14]  
Kaszkiel M, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P178, DOI 10.1145/278459.258561
[15]  
KYOUNG H, 2006, J INFORM PROCESSING, V43, P353
[16]   Passage retrieval: A probabilistic technique [J].
Melucci, M .
INFORMATION PROCESSING & MANAGEMENT, 1998, 34 (01) :43-68
[17]   A SURVEY OF DECISION TREE CLASSIFIER METHODOLOGY [J].
SAFAVIAN, SR ;
LANDGREBE, D .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1991, 21 (03) :660-674
[18]   Automatic text decomposition and structuring [J].
Salton, G ;
Allan, J ;
Singhal, A .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (02) :127-138
[19]  
Salton G., 1988, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
[20]  
SALTON G, 1993, P 16 ANN INT ACM SIG, P49