改进视觉问答中的交叉时间通用化 (Improving the Cross-Lingual Generalisation in Visual Question Answering)

While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available.

翻译：虽然多语种的视觉-语言先行模式取得了若干好处,但最近各种任务和语言的基准显示,在对非英语数据适用多语言先行的经过多语言培训的视觉-语言模型时,跨语言通用性化程度较差,英语业绩(监督)和跨语言转让(零点数)之间存在巨大差距。在这项工作中,我们探索这些模型在零点跨语言的视觉-视觉问题解答(VQA)任务方面表现不佳,这些模型对英语视觉-问题数据进行了微调,对7种典型的多种语言进行了评价。我们改进了跨语言的跨语言转让,这三项战略是:(1) 我们引入了语言前目标,以类似性损失来增加跨职业性损失,在培训期间指导模型,(2) 我们学习了一个针对具体任务的子网络,在不修改模式的情况下改进跨语言的概括性,减少差异。(3) 我们用合成代码混合方法来推动源和目标语言之间的结合。我们使用预先培训的多语种-多式变异器UC2和M3P对 xGQA进行了实验。我们利用预先培训的多语言变换模式进行的实验,展示了拟议的微调战略与现有7种语言的版本数据转换方法。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

123+阅读 · 2020年7月18日