关于机器翻译的第一共同任务的结论 (Findings of the First Shared Task on Machine Translation Robustness)

We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models; robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations. As a new task, we received 23 submissions by 11 participating teams from universities, companies, national labs, etc. All submitted systems achieved large improvements over baselines, with the best improvement having +22.33 BLEU. We evaluated submissions by both human judgment and automatic evaluation (BLEU), which shows high correlations (Pearson's r = 0.94 and 0.95). Furthermore, we conducted a qualitative analysis of the submitted systems using compare-mt, which revealed their salient differences in handling challenges in this task. Such analysis provides additional insights when there is occasional disagreement between human judgment and BLEU, e.g. systems better at producing colloquial expressions received higher score from human judgment.

翻译：我们分享关于提高机器翻译可靠性的第一个共同任务(MT)的结果。任务提供了代表现实世界中部署的MT模型所面临的挑战的测试台,并促进了改进模型的新办法;对噪音输入和域错配的稳健性;我们侧重于两种语文对(英语-法语和英语-日语),对提交的系统进行了由关于Reddit和专业翻译的吵杂评论组成的盲式测试集的评估;作为一项新任务,我们收到了来自大学、公司、国家实验室等11个参与小组的23份材料。所有提交系统都取得了基线方面的重大改进,改进得最佳的是+22.33 BLEU。我们评估了由人类判断和自动评价(BLU)提交的材料,这表明了高度的相关性(Pearson's r= 0.94和0.95)。此外,我们利用对比模型对提交的系统进行了质量分析,其中揭示了它们在处理这项任务挑战方面的显著差异。当人类判断和BLEU(例如更好地制作从人类判断中获得更高分数的学术表达的系统)之间偶尔出现分歧时,这种分析提供了更多见解。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning

专知会员服务

12+阅读 · 2020年2月24日

【微软雷德蒙研究院】对抗机器学习工业视角，Adversarial Machine Learning - Industry Perspectives

专知会员服务

11+阅读 · 2020年2月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日