多模式情感认识的自我意识是否可避免交叉关注? (Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?)

Humans express their emotions via facial expressions, voice intonation and word choices. To infer the nature of the underlying emotion, recognition models may use a single modality, such as vision, audio, and text, or a combination of modalities. Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts. However, a successful model that fuses modalities requires components that can effectively aggregate task-relevant information from each modality. As cross-modal attention is seen as an effective mechanism for multi-modal fusion, in this paper we quantify the gain that such a mechanism brings compared to the corresponding self-attention mechanism. To this end, we implement and compare a cross-attention and a self-attention model. In addition to attention, each model uses convolutional layers for local feature extraction and recurrent layers for global sequential modelling. We compare the models using different modality combinations for a 7-class emotion classification task using the IEMOCAP dataset. Experimental results indicate that albeit both models improve upon the state-of-the-art in terms of weighted and unweighted accuracy for tri- and bi-modal configurations, their performance is generally statistically comparable. The code to replicate the experiments is available at https://github.com/smartcameras/SelfCrossAttn

翻译：人类通过面部表情、声音和单词选择表达情感; 为推断基本情感的性质, 识别模型可能使用单一模式, 如视觉、音频和文字, 或多种模式的组合。一般来说, 将多种模式的补充信息结合在一起的模型优于单一模式。然而, 组合模式的成功模型需要能够有效地汇总与任务有关的信息的组件。跨模式关注被视为多模式融合的有效机制, 本文中我们量化了这种机制与相应的自我注意机制相比带来的收益。为此, 我们实施并比较了一个交叉注意和自我注意模式。除了注意外, 每种模型使用进化层来提取本地特征, 以及全球连续建模的反复层。我们用IMOCCAP数据集比较了7级情感分类任务的不同模式组合。实验结果显示, 尽管这两种模型在加权和非加权精确的三审和双摩模式精确度方面都取得了进步。其业绩一般是可比较的, 在三审和双摩模式/ 自定义的模型上, 可以进行可比较。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日