使用训练有素的自动校准器进行跨语言转换的简单几何方法 (A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders)

Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. For efficient learning, we investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings.

翻译：为多种语言而培训的有力句子编码器正在上升。这些系统能够将广泛的语言特性嵌入矢量表达中。虽然可以使用明确的检验任务来核实特定语言特性的存在, 但还不清楚矢量表示器是否可以被操纵来间接引导这些特性。为了高效学习,我们调查在嵌入空间中使用几何绘图来改变语言特性,而不对经过培训的句子编码器或解码器进行任何调整。我们使用经过培训的多语言自动编码器验证了我们关于三种语言特性的方法,并在单一语言和跨语言环境中分析了结果。

相关内容

自编码器

关注 138

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【经典书】凸优化理论，MIT-Dimitri P. Bertsekas教授，257页pdf

专知会员服务

75+阅读 · 2021年8月28日

深度概率图模型，Deep Probabilistic Models

专知会员服务

28+阅读 · 2021年8月2日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

30+阅读 · 2021年6月12日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

60+阅读 · 2021年4月24日