Sentiment analysis methods have become popular for investigating human communication, including discussions related to software projects. Since general-purpose sentiment analysis tools do not fit well with the information exchanged by software developers, new tools, specific for software engineering (SE), have been developed. We investigate to what extent SE-specific tools for sentiment analysis mitigate the threats to conclusion validity of empirical studies in software engineering, highlighted by previous research. First, we replicate two studies addressing the role of sentiment in security discussions on GitHub and in question-writing on Stack Overflow. Then, we extend the previous studies by assessing to what extent the tools agree with each other and with the manual annotation on a gold standard of 600 documents. We find that different SE-specific sentiment analysis tools might lead to contradictory results at a fine-grain level, when used 'off-the-shelf'. Conversely, platform-specific tuning or retraining might be needed to take into account differences in platform conventions, jargon, or document lengths.
翻译:由于通用情绪分析工具与软件开发者交换的信息不相适应,因此已经开发了软件工程专用的新工具。我们调查了过去研究所强调的、用于情绪分析的特定工具在多大程度上减轻了软件工程经验研究对结论有效性的威胁。首先,我们复制了两项研究,涉及在GitHub的安全讨论和Stack overflow问题书写中情感的作用。然后,我们扩大了先前的研究,评估了这些工具相互之间在多大程度上一致,以及这些工具在多大程度上与600份文件的黄金标准的手册注释一致。我们发现,不同的东南欧特有的情绪分析工具在使用“现成”时,可能会在细微层次上导致相互矛盾的结果。相反,可能需要针对平台的调整或再培训,以考虑到平台公约、术语或文件长度的差异。