Sentiment Polarity Detection for Software Development

被引:157
作者
Calefato, Fabio [1 ]
Lanubile, Filippo [2 ]
Maiorano, Federico [2 ]
Novielli, Nicole [2 ]
机构
[1] Univ Bari A Moro, Dipartimento Jon, Via Duomo 259, I-74123 Taranto, Italy
[2] Univ Bari A Moro, Dipartimento Informat, Via E Orabona 4, I-70125 Bari, Italy
关键词
Sentiment Analysis; Communication Channels; Stack Overflow; Word Embedding; Social Software Engineering; REPRESENTATION;
D O I
10.1007/s10664-017-9546-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.
引用
收藏
页码:1352 / 1382
页数:31
相关论文
共 74 条
[1]  
[Anonymous], 2012, P 18 ACM SIGKDD INT, DOI DOI 10.1145/2339530.2339665
[2]  
[Anonymous], 1965, Philosophical investigations
[3]  
[Anonymous], 1966, The general inquirer: A computer approach to content analysis
[4]  
[Anonymous], P 9 INT C LANG RES E
[5]  
Blaz CCA, 2016, 13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), P235, DOI [10.1109/MSR.2016.032, 10.1145/2901739.2901781]
[6]  
Asaduzzaman M, 2013, IEEE WORK CONF MIN S, P97, DOI 10.1109/MSR.2013.6624015
[7]  
Baroni M, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P238
[8]   What are developers talking about? An analysis of topics and trends in Stack Overflow [J].
Barua, Anton ;
Thomas, Stephen W. ;
Hassan, Ahmed E. .
EMPIRICAL SOFTWARE ENGINEERING, 2014, 19 (03) :619-654
[9]  
Basile P., 2015, P 5 INT WORKSH SEM E, P595
[10]  
Bengio Y, 2001, ADV NEUR IN, V13, P932