Sentiment classification of Internet restaurant reviews written in Cantonese

被引:124
作者
Zhang, Ziqiong [1 ]
Ye, Qiang [1 ]
Zhang, Zili [1 ]
Li, Yijun [1 ]
机构
[1] Harbin Inst Technol, Dept Management Sci & Engn, Harbin 150001, Peoples R China
基金
美国国家科学基金会;
关键词
Sentiment classification; Online review; Cantonese; Restaurant; Machine learning; TALK;
D O I
10.1016/j.eswa.2010.12.147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cantonese is an important dialect in some regions of Southern China. Local online users often represent their opinions and experiences on the web with written Cantonese. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product and the Cantonese reviews are unintelligible for Mandarin Chinese speakers. In this paper, standard machine learning techniques naive Bayes and SVM are incorporated into the domain of online Cantonese-written restaurant reviews to automatically classify user reviews as positive or negative. The effects of feature presentations and feature sizes on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The naive Bayes classifier achieves as well as or better accuracy than SVM. Character-based bigrams are proved better features than unigrams and trigrams in capturing Cantonese sentiment orientation. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7674 / 7682
页数:9
相关论文
共 29 条
[1]  
[Anonymous], 2005, Proceedings of the ACL student research workshop
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]  
Cheung C.M. Y., 2004, Proceedings of the 8th Pacific-Asia conference on information systems, P2100
[4]  
Cheung K., 2002, J CHINESE LINGUISTIC
[5]   Yahoo! for Amazon: Sentiment extraction from small talk on the web [J].
Das, Sanjiv R. ;
Chen, Mike Y. .
MANAGEMENT SCIENCE, 2007, 53 (09) :1375-1388
[6]  
Dave K., 2003, Proceedings of the 12th international conference on world wide web, P519, DOI DOI 10.1145/775152.775226
[7]   The digitization of word of mouth: Promise and challenges of online feedback mechanisms [J].
Dellarocas, C .
MANAGEMENT SCIENCE, 2003, 49 (10) :1407-1424
[8]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[9]  
Fujii Atsushi, 2006, P WORKSHOP SENTIMENT, P15
[10]   Talk of the network: A complex systems look at the underlying process of word-of-mouth [J].
Goldenberg, J ;
Libai, B ;
Muller, E .
MARKETING LETTERS, 2001, 12 (03) :211-223