Thumbs up? Sentiment classification using machine learning techniques

被引:3650
作者
Pang, B [1 ]
Lee, L [1 ]
Vaithyanathan, S [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
来源
PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING | 2002年
关键词
D O I
10.3115/1118693.1118704
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.
引用
收藏
页码:79 / 86
页数:8
相关论文
共 31 条
[1]  
[Anonymous], P 8 AS PAC FIN ASS A
[2]  
ARGAMONENGELSON S, 1998, P AAAI WORKSH LEARN, P1
[3]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[4]  
Biber D., 1988, Variation across speech and writing, DOI [DOI 10.1017/CBO9780511621024, 10.1017/CBO9780511621024]
[5]   A survey of smoothing techniques for ME models [J].
Chen, SF ;
Rosenfeld, R .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (01) :37-50
[6]   Inducing features of random fields [J].
DellaPietra, S ;
DellaPietra, V ;
Lafferty, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) :380-393
[7]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[8]  
Finn A, 2002, LECT NOTES COMPUT SC, V2291, P353
[9]  
HATZIVASSILOGLO.V, 2000, P COLING
[10]   Predicting the semantic orientation of adjectives [J].
Hatzivassiloglou, V ;
McKeown, KR .
35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, :174-181