使用基于精密点设置的注册 (Unsupervised Word Translation Pairing using Refinement based Point Set Registration)

Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs.

翻译：在跨语言知识转让、改进机器翻译和其他多语种应用方面,跨语言拼写字嵌入式的跨语言拼嵌在语言知识转让中起着重要作用。目前未受监督的做法取决于跨语言嵌入空格字的几何结构的相似性,以学习使用对立网络和完善战略的结构保护线性转变。然而,这些技术在实践中往往受到不稳定和趋同问题的影响,需要为精确的参数设定进行冗长的微调。本文件提议BioSpere,这是一个未经监督的双语嵌入共享矢量空间的新框架,将对抗性初始化和完善程序与图像处理中使用的点定登记算法相结合。我们表明,我们的框架可以减轻现有方法的缺点,相对来说,与不同的对立式学习表现不易,说明参数选择和培训损失方面的稳健性。平行词典上岗任务实验性评价展示了我们关于多种语言配对的框架的最新成果。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

17+阅读 · 2020年11月17日

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

专知会员服务

53+阅读 · 2020年9月26日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

52+阅读 · 2020年9月7日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

21+阅读 · 2020年4月21日