We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures. Our task is to map gestures to novel emotion categories not encountered in training. We introduce an adversarial, autoencoder-based representation learning that correlates 3D motion-captured gesture sequence with the vectorized representation of the natural-language perceived emotion terms using word2vec embeddings. The language-semantic embedding provides a representation of the emotion label space, and we leverage this underlying distribution to map the gesture-sequences to the appropriate categorical emotion labels. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions. We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of $58.43\%$. This improves the performance of current state-of-the-art algorithms for generalized zero-shot learning by $25$--$27\%$ on the absolute.
翻译:我们展示了一种新颖的通用零射算法,以识别从手势中感知到的情感。 我们的任务是将手势映射到在训练中没有遇到的新型情感类别。 我们引入了一种基于自动编码器的演示学习,将3D运动获取的手势序列与自然语言感知到的情感术语的矢量化表达方式联系起来,使用Word2vec嵌入器。 语言- 语义嵌入提供了情感标签空间的表达方式, 我们利用这一基本分布将手势序列映到适当的绝对情感标签上。 我们用一种配有已知情感术语的手势组合来培训我们的方法。 我们在MPI情感身体表现数据库(EBEBEDB)上评估了我们的方法, 并获得了58.43 $的准确度。 这改善了当前最先进的算法在绝对值上以25美元至27 美元的通用零射术学习效果。