合成噪音培训,提高机器翻译对自然噪音的威力 (Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation) - 专知论文

会员服务 ·

0

噪声 · 稳健性 · Machine Translation · Performer · SimPLe ·

2019 年 2 月 5 日

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

翻译：合成噪音培训,提高机器翻译对自然噪音的威力

Vladimir Karpukhin,Omer Levy,Jacob Eisenstein,Marjan Ghazvininejad

We consider the problem of making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text. We focus on translation performance on natural noise, as captured by frequent corrections in Wikipedia edit logs, and show that robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution.

翻译：我们考虑了使机器翻译在源头(如伤寒等)对性格差异更加稳健的问题。现有方法通过应用字节编码和字符级编码器等子词模型实现更大的覆盖范围,但这些方法对拼写错误非常敏感。我们展示了关于少量随机合成噪音的培训如何能够大大增强这些差异的稳健性,同时又不降低清洁文本的性能。我们注重自然噪音的翻译性能,维基百科编辑日志中经常校正所捕捉到的自然噪音的翻译性能,并表明在培训时,在没有自然噪音数据或分布的情况下,使用简单合成噪音的均衡饮食,可以实现对此类噪音的稳健性。

0

相关内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

专知会员服务

23+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

58+阅读 · 2019年8月26日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

10+阅读 · 2015年7月1日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Neural Machine Translation Decoding with Terminology Constraints

Arxiv

5+阅读 · 2018年5月9日

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Arxiv

3+阅读 · 2018年4月17日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Synthetic and Natural Noise Both Break Neural Machine Translation

Arxiv

3+阅读 · 2018年2月24日

Improved English to Russian Translation by Neural Suffix Prediction

Arxiv

4+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

专知会员服务

23+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

56+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

【CMU】机器学习导论课程（Introduction to Machine Learning）

【CMU】机器学习导论课程（Introduction to Machine Learning）

专知会员服务

58+阅读 · 2019年8月26日

热门VIP内容

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

25+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

10+阅读 · 2015年7月1日

相关论文

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Neural Machine Translation Decoding with Terminology Constraints

Arxiv

5+阅读 · 2018年5月9日

Improving Character-based Decoding Using Target-Side Morphological Information for Neural Machine Translation

Arxiv

3+阅读 · 2018年4月17日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Analyzing Uncertainty in Neural Machine Translation

Arxiv

6+阅读 · 2018年2月28日

Synthetic and Natural Noise Both Break Neural Machine Translation

Arxiv

3+阅读 · 2018年2月24日

Improved English to Russian Translation by Neural Suffix Prediction

Arxiv

4+阅读 · 2018年1月11日

微信扫码咨询专知VIP会员