Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages, for improving machine translation and other multi-lingual applications. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages, to learn structure-preserving linear transformations using adversarial networks and refinement strategies. However, such techniques, in practice, tend to suffer from instability and convergence issues, requiring tedious fine-tuning for precise parameter setting. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space, by combining adversarial initialization and refinement procedure with point set registration algorithm used in image processing. We show that our framework alleviates the shortcomings of existing methodologies, and is relatively invariant to variable adversarial learning performance, depicting robustness in terms of parameter choices and training losses. Experimental evaluation on parallel dictionary induction task demonstrates state-of-the-art results for our framework on diverse language pairs.
翻译:在跨语言知识转让、改进机器翻译和其他多语种应用方面,跨语言拼写字嵌入式的跨语言拼嵌在语言知识转让中起着重要作用。目前未受监督的做法取决于跨语言嵌入空格字的几何结构的相似性,以学习使用对立网络和完善战略的结构保护线性转变。然而,这些技术在实践中往往受到不稳定和趋同问题的影响,需要为精确的参数设定进行冗长的微调。本文件提议BioSpere,这是一个未经监督的双语嵌入共享矢量空间的新框架,将对抗性初始化和完善程序与图像处理中使用的点定登记算法相结合。我们表明,我们的框架可以减轻现有方法的缺点,相对来说,与不同的对立式学习表现不易,说明参数选择和培训损失方面的稳健性。平行词典上岗任务实验性评价展示了我们关于多种语言配对的框架的最新成果。