Exploiting effective features for chinese sentiment classification

被引:65
作者
Zhai, Zhongwu [1 ]
Xu, Hua [1 ]
Kang, Bada [2 ]
Jia, Peifa [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ So Calif, Viterbi Sch Engn, Los Angeles, CA 90089 USA
基金
中国国家自然科学基金;
关键词
Sentiment classification; Substring features; Substring-group; Suffix tree;
D O I
10.1016/j.eswa.2011.01.047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9139 / 9146
页数:8
相关论文
共 28 条
[1]  
[Anonymous], 2004, P 2004 C EMP METH NA
[2]  
[Anonymous], 2007, ACL '07: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
[3]  
[Anonymous], P 20 INT C COMPUTATI, DOI DOI 10.3115/1220355.1220555
[4]  
[Anonymous], 1997, ACM SIGACT NEWS
[5]  
Blitzer John., 2007, Annual Meeting-Association For Computational Linguistics, V45, P440
[6]  
Bo Pang, 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI 10.1561/1500000001
[7]  
Dave K., 2003, holderProceedings Of The 12th International Conference On World Wide Web, P519, DOI [DOI 10.1145/775152.775226, 10.1145/775152.775226]
[8]   Predicting the semantic orientation of adjectives [J].
Hatzivassiloglou, V ;
McKeown, KR .
35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, :174-181
[9]  
Hu MQ, 2004, PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, P755
[10]  
Joachims T., 1997, ICML, P143, DOI DOI 10.1016/J.ESWA.2016.09.009