A machine learning approach to sentiment analysis in multilingual Web texts

被引:277
作者
Boiy, Erik [1 ]
Moens, Marie-Francine [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Louvain, Belgium
来源
INFORMATION RETRIEVAL | 2009年 / 12卷 / 05期
关键词
Opinion mining; Information tracking; Cross-language learning; Active learning;
D O I
10.1007/s10791-008-9070-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.
引用
收藏
页码:526 / 558
页数:33
相关论文
共 69 条
[1]  
Aman S., 2008, Proceedings of the Third International Joint Conference on Natural Language Processing, P296
[2]  
[Anonymous], 2004, P 2004 C EMP METH NA
[3]  
[Anonymous], INTRO INFORM RETRIEV
[4]  
[Anonymous], 2003, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4, CONLL'03
[5]  
[Anonymous], 2007, ICML
[6]  
[Anonymous], 2001, WORKSH STAT COMP THE
[7]  
[Anonymous], 2003, Language Modeling for Information Retrieval
[8]  
[Anonymous], 2007, P 11 INT C ART INT L
[9]  
[Anonymous], 1994, SIGIR
[10]  
AUE A, 2005, CUSTOMIZING SENTIMEN