具有共同性和多样性建模的不受监督的神经话式翻译 (Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling)

As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation. In this paper, we investigate how to exploit the commonality and diversity between dialects thus to build unsupervised translation models merely accessing to monolingual data. Specifically, we leverage pivot-private embedding, layer coordination, as well as parameter sharing to sufficiently model commonality and diversity among source and target, ranging from lexical, through syntactic, to semantic levels. In order to examine the effectiveness of the proposed models, we collect 20 million monolingual corpus for each of Mandarin and Cantonese, which are official language and the most widely used dialect in China. Experimental results reveal that our methods outperform rule-based simplified and traditional Chinese conversion and conventional unsupervised translation models over 12 BLEU scores.

翻译：作为一项特殊的机器翻译任务,方言翻译具有两个主要特点:(1) 缺乏平行的培训材料;(2) 翻译的两侧之间有着类似的语法。在本文中,我们调查如何利用方言之间的共同性和多样性,从而建立仅仅使用单一语言数据的不受监督的翻译模型。具体地说,我们利用枢纽-私营嵌入、分层协调以及参数共享,充分模拟源和目标之间的共同性和多样性,从词汇学、综合学到语义层面。为了审查拟议模型的有效性,我们为中国官方语言和最广泛使用的方言各收集了2 000万个普通和广东语单语本。实验结果表明,我们的方法超越了基于规则的简化和传统的中文转换以及超过12个BLEU分数的常规非监督翻译模型。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

40+阅读 · 2020年4月11日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

35+阅读 · 2020年3月3日