Enriching Web taxonomies through subject categorization of query terms from search engine logs

被引:22
作者
Chuang, SL [1 ]
Chien, LF [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
关键词
information retrieval; query categorization; log analysis;
D O I
10.1016/S0167-9236(02)00099-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a query-categorization approach to facilitating the engineering process of constructing Web taxonomies. One primary step in taxonomy construction is to acquire the domain-specific terminology terms and the mapping between the subjects and these terms. We introduce a technique for categorizing Web query terms from the logs of on-line search services into a predefined subject taxonomy based on their supposed popular search interests. The obtained experimental results show our technique's effectiveness in reducing the workload of human indexers in constructing Web taxonomies and also show its usefulness in various Web information retrieval applications. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:113 / 127
页数:15
相关论文
共 20 条
[1]  
AGIRRE E, ECAI 2000 WORKSH ONT
[2]   The paraphrase search assistant: Terminological feedback for iterative information seeking [J].
Anick, PG ;
Tipirneni, S .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :153-159
[3]  
[Anonymous], P ICML 97
[4]  
Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176
[5]  
BRUZA P, 2000, P 23 ANN INT ACM SIG, P280, DOI DOI 10.1145/345508.345598
[6]  
Byrd RJ, 1999, P 4 INT C APPL NAT L
[7]   PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval [J].
Chien, LF .
INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (04) :501-521
[8]  
CHIEN LF, 1996, COMPUTATIONAL LINGUI, V1, P205
[9]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[10]  
2-9