The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop \& Competition. Our proposed method contains three stages. Firstly, we extract the latent features from the raw audio signal and its Mel-spectrogram by self-supervised learning methods. Then, the features from the raw signal are fed to the self-relation attention and temporal awareness (SA-TA) module for learning the valuable information between these latent features. Finally, we concatenate all the features and utilize a fully-connected layer to predict each emotion's score. By empirical experiments, our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, compared to 0.5686 on the baseline model. The code of our method is available at https://github.com/linhtd812/A-VB2022.
翻译:技术报告介绍了我们在ACII Affective Vocal Bursts (A-VB) 2022 研讨会“竞争” 中的高维情感任务(A-VB High)的情感识别管道。我们建议的方法包含三个阶段。首先,我们通过自我监督的学习方法,从原始音频信号及其梅尔光谱中提取潜在特征。然后,原始信号的特征被输入到学习这些潜在特征之间宝贵信息的自我关系关注和时间意识模块(SA-TA)。最后,我们将所有特征集中在一起,并利用一个完全相连的层来预测每一种情感的得分。通过实验实验,我们建议的方法在测试集上实现了0.7295的平均值一致性相关系数(CCC),而基线模型上为0.56866。我们的方法代码可在https://github.com/linhtd812A-VB2022查阅。