Enhancing Cluster Labeling Using Wikipedia

被引:74
作者
Carmel, David [1 ]
Roitman, Haggai [1 ]
Zwerdling, Naama [1 ]
机构
[1] IBM Res Lab, IL-31905 Haifa, Israel
来源
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2009年
关键词
Cluster labeling; Wikipedia;
D O I
10.1145/1571941.1571967
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.
引用
收藏
页码:139 / 146
页数:8
相关论文
共 21 条
[11]  
Gabrilovich E., 2006, AAAI, P1301
[12]  
GERACI F, 2007, INTERNET MATH
[13]  
Glover E., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P507, DOI 10.1145/584792.584876
[14]   A concept-driven algorithm for clustering search results [J].
Osinski, S ;
Weiss, D .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (03) :48-54
[15]   Centroid-based summarization of multiple documents [J].
Radev, DR ;
Jing, HY ;
Stys, M ;
Tam, D .
INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (06) :919-938
[16]   Identifying document topics using the Wikipedia category network [J].
Schoenhofen, Peter .
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, :456-462
[17]  
Strube Michael, 2006, WIKIRELATE COMPUTING
[18]  
SYED ZS, 2008, ICWSM 08
[19]  
Toda H., 2005, Special interest tracks and posters of the 14th international conference on World Wide Web, P988
[20]  
Treeratpituk P., 2006, DG O 06, P167