提高用合成噪音进行机器翻译的威力 (Improving Robustness of Machine Translation with Synthetic Noise)

Modern Machine Translation (MT) systems perform consistently well on clean, in-domain text. However most human generated text, particularly in the realm of social media, is full of typos, slang, dialect, idiolect and other noise which can have a disastrous impact on the accuracy of output translation. In this paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. Synthesizing noise in this manner we are ultimately able to make a vanilla MT system resilient to naturally occurring noise and partially mitigate loss in accuracy resulting therefrom.

翻译：现代机器翻译(MT)系统在清洁、内部文本方面始终运作良好。然而,大多数人类生成的文本,特别是在社交媒体领域,都充满了打字、 slang、方言、异性和其他噪音,可能对产出翻译的准确性产生灾难性影响。在本文中,我们利用《噪音文本(MTNT)机译数据集》,通过在其他清洁数据中模仿自然产生的噪音来增强MT系统的稳健性。以这种方式合成噪音,我们最终能够使香草MT系统适应自然产生的噪音,并部分减少由此造成的准确性损失。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【KDD2020】具有条件公平性的算法决策，Algorithmic Decision Making with Conditional Fairness

专知会员服务

21+阅读 · 2020年6月19日

【伯克利】黑盒机器翻译系统的模仿攻击与防御，Imitation Attacks and Defenses for Black-box Machine Translation Systems

专知会员服务

6+阅读 · 2020年5月4日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日