SAMAR: Subjectivity and sentiment analysis for Arabic social media

被引:134
作者
Abdul-Mageed, Muhammad [1 ,2 ]
Diab, Mona [3 ]
Kuebler, Sandra [1 ]
机构
[1] Indiana Univ, Dept Linguist, Bloomington, IN 47405 USA
[2] Sch Lib & Informat Sci, Bloomington, IN 47405 USA
[3] George Washington Univ, Sch Engn & Appl Sci, Dept Comp Sci, Washington, DC 20052 USA
关键词
Subjectivity and sentiment analysis; Morphologically rich language; Arabic; Social media data;
D O I
10.1016/j.csl.2013.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SAMAR is a system for subjectivity and sentiment analysis (SSA) for Arabic social media genres. Arabic is a morphologically rich language, which presents significant complexities for standard approaches to building SSA systems designed for the English language. Apart from the difficulties presented by the social media genres processing, the Arabic language inherently has a high number of variable word forms leading to data sparsity. In this context, we address the following 4 pertinent issues: how to best represent lexical information; whether standard features used for English are useful for Arabic; how to handle Arabic dialects; and, whether genre specific features have a measurable impact on performance. Our results show that using either lemma or lexeme information is helpful, as well as using the two part of speech tagsets (RTS and ERTS). However, the results show that we need individualized solutions for each genre and task, but that lemmatization and the ERTS POS tagset are present in a majority of the settings. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:20 / 37
页数:18
相关论文
共 48 条
[1]   Applying authorship analysis to extremist-group web forum messages [J].
Abbasi, A ;
Chen, HC .
IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) :67-75
[2]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[3]  
Abdul-Mageed M., 2011, 12 ANN C ASS INT RES
[4]  
Abdul-Mageed M., 2011, P RANLP2011 HISS BUL
[5]  
Abdul-Mageed M., 2012, P LREC IST TURK
[6]  
Abdul-Mageed M., 2012, P 6 INT GLOB WORDNET
[7]  
[Anonymous], 2011, WORKSH UNS LEARN NLP
[8]  
[Anonymous], SVMLIGHT SUPPORT VEC
[9]  
[Anonymous], 2012, Synth. Lectures Human Lang. Technol., DOI [10.2200/S00416ED1V01Y201204HLT016, DOI 10.2200/S00416ED1V01Y201204HLT016]
[10]  
[Anonymous], 2011, P 49 ANN M ASS COMPU