Emotions are central to politics and analyzing their role in political communication has a long tradition. As research increasingly leverages audio-visual materials to analyze emotions, the emergence of multimodal generative Artificial Intelligence (AI) promises great advances. However, we lack evidence about the effectiveness of multimodal AI in analyzing emotions in political communication. This paper addresses this gap by evaluating current multimodal large language models (mLLMs) in the video-based analysis of emotional arousal, using two complementary datasets of human-labeled video recordings. It finds that under ideal circumstances, mLLMs' emotional arousal ratings are highly reliable and exhibit little to no demographic bias. However, in recordings of real-world parliamentary debates, mLLMs' arousal ratings fail to deliver on this promise with potential negative consequences for downstream statistical inferences. This study therefore underscores the need for continued, thorough evaluation of emerging generative AI methods in multimodal political analysis and contributes a suitable replicable framework.
翻译:情感是政治的核心,分析其在政治传播中的作用具有悠久传统。随着研究越来越多地利用视听材料分析情感,多模态生成式人工智能(AI)的出现预示着重大进展。然而,我们缺乏关于多模态AI在分析政治传播情感方面有效性的证据。本文通过评估当前多模态大语言模型(mLLMs)在基于视频的情感唤醒度分析中的表现来填补这一空白,使用了两个互补的人工标注视频数据集。研究发现,在理想条件下,mLLMs的情感唤醒度评分具有高度可靠性,且几乎不存在人口统计学偏差。然而,在现实世界议会辩论的录像中,mLLMs的唤醒度评分未能兑现这一承诺,可能对后续统计推断产生负面影响。因此,本研究强调在多模态政治分析中持续深入评估新兴生成式AI方法的必要性,并提供了一个可复现的适用框架。