This paper describes NTT's submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.
翻译:本文件介绍NTT公司提交WMT19强力任务的情况。这一任务主要侧重于翻译吵闹的文本(例如推特上的文章),这与典型的翻译任务(例如新闻)存在不同的困难。我们的提交材料综合技术包括合成材料的使用、域适应和占位装置机制,这些技术比前一个基线大为改善。实验结果揭示了占位装置机制,它暂时取代了非标准标记,包括翻译时用特别占位符符号的模版和表情,提高了翻译的准确性,即使翻译时使用了吵闹的文本。