CTAL: 音像和语言代表制跨式变革器培训前 (CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations)

Existing audio-language task-specific predictive approaches focus on building complicated late-fusion mechanisms. However, these models are facing challenges of overfitting with limited labels and low model generalization abilities. In this paper, we present a Cross-modal Transformer for Audio-and-Language, i.e., CTAL, which aims to learn the intra-modality and inter-modality connections between audio and language through two proxy tasks on a large amount of audio-and-language pairs: masked language modeling and masked cross-modal acoustic modeling. After fine-tuning our pre-trained model on multiple downstream audio-and-language tasks, we observe significant improvements across various tasks, such as, emotion classification, sentiment analysis, and speaker verification. On this basis, we further propose a specially-designed fusion mechanism that can be used in fine-tuning phase, which allows our pre-trained model to achieve better performance. Lastly, we demonstrate detailed ablation studies to prove that both our novel cross-modality fusion component and audio-language pre-training methods significantly contribute to the promising results.

翻译：现有的针对听力任务的预测方法侧重于建立复杂的迟融合机制。然而,这些模型面临着过度装配有限标签和低模范通用能力的挑战。在本文件中,我们介绍了一个用于音频和语言的跨模式变异器,即CTAL,其目的是通过对大量音频和语言对口的两种代用任务学习音频和语言之间的内部和现代联系:隐蔽语言建模和掩蔽的跨模式音频建模。在对多下游音频和语言任务预先培训的模式进行微调后,我们观察到了各种任务的重大改进,例如情感分类、情绪分析和语音校验。在此基础上,我们进一步提议了一种专门设计的聚变机制,可用于微调阶段,从而使我们经过预先培训的模范能够取得更好的业绩。最后,我们展示了详细的反动研究,以证明我们新的跨模式融合组件和音频培训前方法都极大地促进了前景。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日