Subject categorization of query terms for exploring Web users' search interests

被引:48
作者
Pu, HT [1 ]
Chuang, SL
Yang, C
机构
[1] Natl Chiao Tung Univ, Inst Informat Management, Hsinchu 300, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2002年 / 53卷 / 08期
关键词
D O I
10.1002/asi.10071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies.
引用
收藏
页码:617 / 630
页数:14
相关论文
共 45 条
[1]  
[Anonymous], 1994, Cataloging and Classification: an Introduction
[2]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[3]  
Carlyle A., 1989, Cataloging & Classification Quarterly, V10, P37, DOI 10.1300/J104v10n01_04
[4]  
CHIEN LF, 1996, COMPUTATIONAL LINGUI, V1, P205
[5]  
CHUANG SI, 2000, ACM SIGIR 2000
[6]  
Drabenstott KM, 1996, J AM SOC INFORM SCI, V47, P519, DOI 10.1002/(SICI)1097-4571(199607)47:7<519::AID-ASI5>3.0.CO
[7]  
2-X
[8]  
DRABENSTOTT KM, 1994, USING SUBJECT HEADIN
[9]  
GOLLER C, 2000, IEEE INTELL SYST APP, V14, P75
[10]  
*GVU CTR COLL COMP, 1998, GVU WWW US SURV