Recent work has shown that visual context improves cross-lingual sense disambiguation for nouns. We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs. Each image in MultiSense is annotated with an English verb and its translation in German or Spanish. We show that cross-lingual verb sense disambiguation models benefit from visual context, compared to unimodal baselines. We also show that the verb sense predicted by our best disambiguation model can improve the results of a text-only machine translation system when used for a multimodal translation task.
翻译:最近的工作表明,视觉环境可以改善对名词的跨语种感知脱节。我们把这项工作扩大到跨语种动词感知脱节这一更具挑战性的任务上,引入了以英文、德文和西班牙文动词附加注释的9,504张多语种图像数据集。多语种中的每一张图像都配有英文动词及其德文或西班牙文译文。我们显示,跨语种动词感识脱节模型从视觉环境中受益,而与单一模式基线相比。我们还表明,我们的最佳脱节模型所预测的动词感知在用于多式翻译任务时可以改善只使用文本的机器翻译系统的结果。