PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval

被引:28
作者
Chien, LF [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
D O I
10.1016/S0306-4573(98)00054-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Considering the urgent need for keyphrase extraction techniques in intelligent information retrieval, in this paper we present a PAT-tree-based adaptive approach, which is critical and fundamental for Chinese and other oriental languages. Compared with conventional dictionary-based approaches, the proposed approach can reduce the reliance on rigid lexicon and sophisticated word segmentation, and compared with conventional statistics-based approaches, it can handle phrases composed of high-frequency words regardless of phrase length. Furthermore, the approach has been designed carefully with Internet utilization in mind. For instance, it can be easily integrated into text retrieval systems to provide automatic term suggestion and is adaptable to changes of the database content. The proposed approach has been successfully used in several information retrieval applications, such as automatic term suggestion, domain-specific lexicon construction, book indexing and document classification. Many Chinese and oriental language processing applications are, therefore, able to move ahead from the character level to the word or phrase level. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:501 / 521
页数:21
相关论文
共 25 条
[1]  
BELKIN NJ, 1992, COMMUNICATIONS ACM, V35
[2]  
CHANG JS, 1992, P 3 C APPL NAT LANG, P147
[3]  
CHEN A, 1997, SIGIR 97, P42
[4]  
Chien LF, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P50, DOI 10.1145/278459.258534
[5]  
CHIEN LF, 1997, P 1997 IEEE INT C AC, P1155
[6]  
CHIEN LF, 1995, P 18 ANN INT ACM SIG, P112
[7]  
CHIEN LF, 1995, P 1995 INT C COMP PR, P176
[8]  
Church K. W., 1990, Computational Linguistics, V16, P22
[9]  
Daille Beatrice, 1996, BALANCING ACT COMBIN, P49
[10]  
Gonnet G.H., 1992, INFORMATION RETRIEVA, P66