使用对称偏差软件学习语义句嵌入 (Learning Semantic Sentence Embeddings using Pair-wise Discriminator)

In this paper, we propose a method for obtaining sentence-level embeddings. While the problem of securing word-level embeddings is very well studied, we propose a novel method for obtaining sentence-level embeddings. This is obtained by a simple method in the context of solving the paraphrase generation task. If we use a sequential encoder-decoder model for generating paraphrase, we would like the generated paraphrase to be semantically close to the original sentence. One way to ensure this is by adding constraints for true paraphrase embeddings to be close and unrelated paraphrase candidate sentence embeddings to be far. This is ensured by using a sequential pair-wise discriminator that shares weights with the encoder that is trained with a suitable loss function. Our loss function penalizes paraphrase sentence embedding distances from being too large. This loss is used in combination with a sequential encoder-decoder network. We also validated our method by evaluating the obtained embeddings for a sentiment analysis task. The proposed method results in semantic embeddings and outperforms the state-of-the-art on the paraphrase generation and sentiment analysis task on standard datasets. These results are also shown to be statistically significant.

翻译：在本文中, 我们提出一种获取句级嵌入的方法。虽然对字级嵌入问题的研究非常周密, 我们提出一种获取句级嵌入的新方法。这是在解决参数生成任务时通过简单的方法获得的。如果我们使用顺序编码- 解码编码器模型来生成参数句, 我们希望生成的参数在语义上与原句相近。确保这一点的方法之一是增加对真实的副词嵌入的限制, 以便让真实的副词嵌入更加接近, 且不相关的副词句候选句嵌入的嵌入距离远。使用顺序配对制导法, 与经过适当损失函数训练的编码器共享重量, 就能确保这一点。我们的损失函数会惩罚嵌入距离太远的参数句句子。这种损失会与顺序编码编码- 解密器网络结合使用。我们还验证了我们的方法, 将获得的嵌入内容用于情感分析任务。拟议的方法在语义嵌入中产生结果, 并超越了状态, 配制方法也会显示为重要数据生成结果。

相关内容

损失函数（机器学习）

关注 0

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日