SyMSS: A syntax-based measure for short-text semantic similarity

被引:90
作者
Oliva, Jesus [1 ]
Ignacio Serrano, Jose [1 ]
Dolores del Castillo, Maria [1 ]
Iglesias, Angel [1 ]
机构
[1] CSIC, Bioengn Grp, Madrid 28500, Spain
关键词
Linguistic tools for IS modeling; Text DBs; Natural language processing (NLP); Semantic similarity; Sentence similarity; CONTEXT;
D O I
10.1016/j.datak.2011.01.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence and short-text semantic similarity measures are becoming an important part of many natural language processing tasks, such as text summarization and conversational agents. This paper presents SyMSS, a new method for computing short-text and sentence semantic similarity. The method is based on the notion that the meaning of a sentence is made up of not only the meanings of its individual words, but also the structural way the words are combined. Thus, SyMSS captures and combines syntactic and semantic information to compute the semantic similarity of two sentences. Semantic information is obtained from a lexical database. Syntactic information is obtained through a deep parsing process that finds the phrases in each sentence. Wills this information, the proposed method measures the semantic similarity between concepts that play the same syntactic role. Psychological plausibility is added to the method by using previous findings about how humans weight different syntactic roles when computing semantic similarity. The results show that SyMSS outperforms state-of-the-art methods in terms of rank correlation with human intuition, thus proving the importance of syntactic information in sentence semantic similarity computation. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:390 / 405
页数:16
相关论文
共 44 条
[1]  
Achananuparp P, 2008, LECT NOTES COMPUT SC, V5182, P305, DOI 10.1007/978-3-540-85836-2_29
[2]  
Achananuparp Palakorn, 2008, P QAWEB 2008 WORKSH
[3]   A new sentence similarity measure and sentence based extractive technique for automatic text summarization [J].
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7764-7772
[4]  
[Anonymous], 1995, Natural language understanding
[5]  
[Anonymous], 2005, P ACL WORKSHOP EMPIR
[6]  
[Anonymous], 200525 UMSI
[7]  
[Anonymous], 1997, PROC 10 RES COMPUTAT
[8]  
Banerjee S., 2003, Proceedings of the 18th International Joint Conference on Artificial Intelligence, V3, P805
[9]  
Bar-Haim R., 2005, Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Association for Computational Linguistics, P55
[10]   Noun retrieval effect on text summarization and delivery of personalized news articles to the user's desktop [J].
Bouras, Christos ;
Tsogkas, Vassilis .
DATA & KNOWLEDGE ENGINEERING, 2010, 69 (07) :664-677