A short text modeling method combining semantic and statistical information

被引:59
作者
Liu Wenyin [1 ]
Quan, Xiaojun [1 ]
Feng, Min [1 ]
Qiu, Bite [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
关键词
Text similarity; Short text similarity; Information retrieval; Query expansion; Text mining; Question answering; SIMILARITY; EXTRACTION;
D O I
10.1016/j.ins.2010.06.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initial similarity between words by using a lexical database. The method then iteratively calculates both word similarity and short text similarity. Finally, a proximity matrix is constructed based on word similarity and used to convert the raw text snippets into vectors. Word similarity and text clustering experiments show that the proposed short text modeling method improves the performance of existing text-related information retrieval (IR) techniques. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:4031 / 4041
页数:11
相关论文
共 34 条
[1]  
Anderberg M.R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
[2]  
[Anonymous], 1996, Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering
[3]  
[Anonymous], 2006, Proceedings of the 15th international conference on World Wide Web
[4]  
[Anonymous], 2000, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
[5]  
*BUYANS, US INT QUEST ANSW SY
[6]   Literature extraction of protein functions using sentence pattern mining [J].
Chiang, JH ;
Yu, HC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (08) :1088-1098
[7]  
Coelho TAS, 2004, IEEE T KNOWL DATA EN, V16, P408, DOI 10.1109/TKDE.2004.1269666
[8]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[9]  
Dori D., 2002, OBJECT PROCESS METHO
[10]   LexRank: Graph-based lexical centrality as salience in text summarization [J].
Erkan, G ;
Radev, DR .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 :457-479